This is an R Markdown document. Instructions for writing these documents and background information can be found in the book R Markdown: The Definitive Guide. When you execute code within the document, the results appear beneath the code.

1 show data set

Datensatz ist der Campus-File des IQB-Ländervergleichs 2011 der Primarstufe (Zugang über ), Bedeutung Variablen einsichtig über Suchfunktion Skalenhandbuch.

dim(datenLV)
## [1] 3005   33
knitr::kable(datenLV[1:4,], digits = 2)
idsch_FDZ idstud_FDZ tr_sex tr_age Emigr EDezh EHisei EHisced_akt SBuecher SLesZt tr_NotDe tr_NotMa tr_Wdh_r SSkDe_a SSkDe_b SSkDe_c SSkDe_d SSkMa_a SSkMa_b SSkMa_c SSkMa_d SBezMs_a SBezMs_b SBezMs_c SBezMs_d SSkDe SSkMa SBezMs wle_lesen wle_hoeren wle_mathe schoolEconDis schoolMiganteil
1 1 maennlich 10.42 keinMig Kind spricht zu Hause immer oder fast immer Deutsch 55 ISCED level 5B 100 Buecher 30 Minuten bis zu einer Stunde 2 2 nein 4 1 4 4 3 3 3 4 4 3 1 1 4.00 3.00 3.75 -0.17 -0.63 -0.33 33-66% oekonomisch mittel > 20% Miganteil
1 2 maennlich 9.83 Mig Kind spricht zu Hause immer oder fast immer Deutsch 49 ISCED level 5B mehr 200 Buecher 30 Minuten bis zu einer Stunde 2 2 nein 2 2 2 2 3 1 3 3 3 3 1 1 2.25 3.25 3.50 -0.44 -0.98 0.67 33-66% oekonomisch mittel > 20% Miganteil
1 3 maennlich 10.50 NA Kind spricht zu Hause nie oder manchmal Deutsch NA NA 25 Buecher 30 Minuten bis zu einer Stunde 4 3 nein 3 1 4 3 3 1 3 3 4 1 1 2 3.50 3.25 3.00 -1.21 -1.06 -0.67 33-66% oekonomisch mittel > 20% Miganteil
1 4 maennlich 10.67 NA NA NA NA 10 Buecher weniger als 30 Minuten 2 3 nein 3 3 4 4 4 1 4 4 1 4 4 4 3.25 4.00 1.75 0.26 -1.17 0.14 33-66% oekonomisch mittel > 20% Miganteil
summary(datenLV)
##    idsch_FDZ       idstud_FDZ         tr_sex         tr_age      
##  Min.   :  1.0   Min.   :   1   maennlich:1530   Min.   : 6.833  
##  1st Qu.: 53.0   1st Qu.: 752   weiblich :1475   1st Qu.:10.083  
##  Median :104.0   Median :1503                    Median :10.417  
##  Mean   :103.3   Mean   :1503                    Mean   :10.425  
##  3rd Qu.:155.0   3rd Qu.:2254                    3rd Qu.:10.750  
##  Max.   :201.0   Max.   :3005                    Max.   :13.000  
##                                                  NA's   :7       
##      Emigr                                                      EDezh     
##  Mig    : 493   Kind spricht zu Hause immer oder fast immer Deutsch:2346  
##  keinMig:1966   Kind spricht zu Hause nie oder manchmal Deutsch    : 191  
##  NA's   : 546   NA's                                               : 468  
##                                                                           
##                                                                           
##                                                                           
##                                                                           
##      EHisei              EHisced_akt               SBuecher   
##  Min.   :10.00   ISCED level 1 :  23   10 Buecher      : 153  
##  1st Qu.:37.00   ISCED level 2 :  81   25 Buecher      : 557  
##  Median :48.00   ISCED level 3A:  10   100 Buecher     :1112  
##  Mean   :49.57   ISCED level 5B:1556   200 Buecher     : 563  
##  3rd Qu.:61.00   ISCED level 5A: 724   mehr 200 Buecher: 550  
##  Max.   :89.00   ISCED level 6 : 126   NA's            :  70  
##  NA's   :622     NA's          : 485                          
##                             SLesZt        tr_NotDe       tr_NotMa    
##  weniger als 30 Minuten        : 862   Min.   :1.00   Min.   :1.000  
##  30 Minuten bis zu einer Stunde:1173   1st Qu.:2.00   1st Qu.:2.000  
##  1-2 Stunden                   : 475   Median :2.00   Median :2.000  
##  2 Stunden oder mehr           : 398   Mean   :2.46   Mean   :2.509  
##  NA's                          :  97   3rd Qu.:3.00   3rd Qu.:3.000  
##                                        Max.   :5.00   Max.   :5.000  
##                                        NA's   :131    NA's   :127    
##  tr_Wdh_r       SSkDe_a         SSkDe_b         SSkDe_c         SSkDe_d     
##  nein:2830   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  ja  : 168   1st Qu.:3.000   1st Qu.:1.000   1st Qu.:3.000   1st Qu.:3.000  
##  NA's:   7   Median :3.000   Median :2.000   Median :3.000   Median :3.000  
##              Mean   :3.116   Mean   :2.126   Mean   :3.303   Mean   :3.334  
##              3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000  
##              Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :4.000  
##              NA's   :92      NA's   :121     NA's   :126     NA's   :123    
##     SSkMa_a        SSkMa_b         SSkMa_c         SSkMa_d         SBezMs_a    
##  Min.   :1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.00   1st Qu.:1.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000  
##  Median :3.00   Median :2.000   Median :3.000   Median :4.000   Median :3.000  
##  Mean   :3.17   Mean   :2.065   Mean   :3.314   Mean   :3.331   Mean   :3.365  
##  3rd Qu.:4.00   3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :4.00   Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :4.000  
##  NA's   :82     NA's   :115     NA's   :121     NA's   :102     NA's   :164    
##     SBezMs_b       SBezMs_c        SBezMs_d         SSkDe           SSkMa      
##  Min.   :1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.00   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:2.750   1st Qu.:2.750  
##  Median :3.00   Median :1.000   Median :1.000   Median :3.250   Median :3.250  
##  Mean   :3.11   Mean   :1.633   Mean   :1.493   Mean   :3.156   Mean   :3.187  
##  3rd Qu.:4.00   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:3.750   3rd Qu.:4.000  
##  Max.   :4.00   Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :4.000  
##  NA's   :199    NA's   :173     NA's   :172     NA's   :95      NA's   :85     
##      SBezMs        wle_lesen          wle_hoeren        wle_mathe      
##  Min.   :1.000   Min.   :-5.06686   Min.   :-5.8078   Min.   :-3.4768  
##  1st Qu.:3.000   1st Qu.:-0.68091   1st Qu.:-0.5796   1st Qu.:-0.6384  
##  Median :3.500   Median : 0.12715   Median : 0.1526   Median : 0.1035  
##  Mean   :3.335   Mean   : 0.09367   Mean   : 0.1094   Mean   : 0.1061  
##  3rd Qu.:3.750   3rd Qu.: 0.88132   3rd Qu.: 0.8188   3rd Qu.: 0.8379  
##  Max.   :4.000   Max.   : 4.24000   Max.   : 3.5043   Max.   : 4.7832  
##  NA's   :145                                                           
##                         schoolEconDis         schoolMiganteil
##  <33% oekonomisch benachteiligt: 175   < 20% Miganteil:1645  
##  33-66% oekonomisch mittel     :2320   > 20% Miganteil:1360  
##  >66% oekonomisch bevorzugt    : 510                         
##                                                              
##                                                              
##                                                              
## 

1.1 tidy and transform data

psych::corPlot(r = datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_|^SSkDe_|^SBezMs_")])

# recode
datenLV$SSkMa_b <- 5 - datenLV$SSkMa_b
datenLV$SSkDe_b  <- 5 - datenLV$SSkDe_b
datenLV$SBezMs_c <- 5 - datenLV$SBezMs_c
datenLV$SBezMs_d <- 5 - datenLV$SBezMs_d

psych::corPlot(r = datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_|^SSkDe_|^SBezMs_")])

1.2 describe, visualize your data and get a first impression using R base functions

Insgesamt weisst der Datensatz N=3005 Schüler/innen in 201 Schulen auf. Die Schulen weisen folgende Anzahl von Schüler/innen auf:

proSchule <- aggregate(datenLV$idsch_FDZ,by=list(datenLV$idsch_FDZ),FUN=length) # using base functions
summary(proSchule$x)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00   10.00   15.00   14.95   20.00   20.00
rm(proSchule)
plot(datenLV$wle_lesen, datenLV$wle_mathe)

boxplot(datenLV$wle_lesen ~ datenLV$Emigr)

t.test(datenLV$wle_lesen ~ datenLV$Emigr) # = unequal variances t-test
## 
##  Welch Two Sample t-test
## 
## data:  datenLV$wle_lesen by datenLV$Emigr
## t = -8.6373, df = 720.46, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.6554145 -0.4126442
## sample estimates:
##     mean in group Mig mean in group keinMig 
##            -0.2612065             0.2728229

Bestehen lineare Zusammenhänge mit einer (normalverteilten) numerischen Variablen?

## linear regression
summary(lm(formula = wle_lesen ~ Emigr*tr_sex + SSkMa, data = datenLV))
## 
## Call:
## lm(formula = wle_lesen ~ Emigr * tr_sex + SSkMa, data = datenLV)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3692 -0.7007  0.0184  0.7098  3.8719 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -1.98582    0.12494 -15.894  < 2e-16 ***
## EmigrkeinMig                 0.39521    0.07802   5.065 4.38e-07 ***
## tr_sexweiblich               0.30605    0.10250   2.986  0.00286 ** 
## SSkMa                        0.52233    0.03229  16.174  < 2e-16 ***
## EmigrkeinMig:tr_sexweiblich  0.04767    0.11380   0.419  0.67535    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.108 on 2399 degrees of freedom
##   (601 observations deleted due to missingness)
## Multiple R-squared:  0.1324, Adjusted R-squared:  0.1309 
## F-statistic: 91.49 on 4 and 2399 DF,  p-value: < 2.2e-16

Bestehen lineare Zusammenhänge mit einer binären Variablen? empfohlene Seite: https://www.methodenberatung.uzh.ch/de/datenanalyse_spss/zusammenhaenge/lreg.html

## logistic regression (if > 2 -> ordinal logistic regression)
summary(glm(Emigr ~ wle_lesen+wle_hoeren+wle_mathe,
               data = datenLV, family = binomial))
## 
## Call:
## glm(formula = Emigr ~ wle_lesen + wle_hoeren + wle_mathe, family = binomial, 
##     data = datenLV)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3848   0.4196   0.5748   0.6953   1.4128  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.36913    0.05226  26.198  < 2e-16 ***
## wle_lesen    0.17045    0.05383   3.167  0.00154 ** 
## wle_hoeren   0.12008    0.05871   2.045  0.04081 *  
## wle_mathe    0.33122    0.05930   5.585 2.33e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2464.3  on 2458  degrees of freedom
## Residual deviance: 2338.4  on 2455  degrees of freedom
##   (546 observations deleted due to missingness)
## AIC: 2346.4
## 
## Number of Fisher Scoring iterations: 4
exp(coef(glm(Emigr ~ wle_lesen+wle_hoeren+wle_mathe,
             data = datenLV, family = binomial))) - 1
## (Intercept)   wle_lesen  wle_hoeren   wle_mathe 
##   2.9319093   0.1858348   0.1275879   0.3926618

Anmerkung: Hypothestentest, logistische Regression sind die zentralen Verfahren für die deduktive Methode der Itementwicklung

1.3 missing data

## missing data patterns
mdpattern <- mice::md.pattern(x = datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_|^SSkDe_")], plot = TRUE, rotate.names = TRUE)

## sorted by # missing
mdpattern[match(x = sort(x = as.numeric(rownames(mdpattern)), decreasing = TRUE), table = as.numeric(rownames(mdpattern))), ]
##      SSkMa_a SSkDe_a SSkMa_d SSkMa_b SSkDe_b SSkMa_c SSkDe_d SSkDe_c  
## 2756       1       1       1       1       1       1       1       1 0
## 50         0       0       0       0       0       0       0       0 8
## 22         1       0       1       1       0       1       0       0 4
## 16         1       1       1       1       1       1       1       0 1
## 16         1       1       1       1       1       1       1       0 1
## 15         1       1       1       0       1       1       1       1 1
## 14         1       1       1       1       1       1       0       1 1
## 12         1       1       1       1       1       0       1       1 1
## 11         0       1       0       0       1       0       1       1 4
## 10         0       1       1       1       1       1       1       1 1
## 7          1       1       1       0       1       0       1       1 2
## 6          1       1       0       0       1       0       1       1 3
## 5          1       1       0       1       1       1       1       1 1
## 4          1       1       1       1       1       1       0       0 2
## 4          1       1       1       1       1       1       0       0 2
## 4          1       1       1       1       1       1       0       0 2
## 3          1       1       1       0       0       1       1       1 2
## 3          1       1       1       0       0       1       1       1 2
## 3          1       1       1       0       0       1       1       1 2
## 3          1       1       1       0       0       1       1       1 2
## 2          1       1       1       1       1       0       1       0 2
## 2          1       1       1       1       1       0       1       0 2
## 2          1       1       1       1       1       0       1       0 2
## 2          1       1       1       1       1       0       1       0 2
## 2          1       1       1       1       1       0       1       0 2
## 2          1       1       1       1       1       0       1       0 2
## 2          1       1       1       1       1       0       1       0 2
## 2          1       1       1       1       1       0       1       0 2
## 2          1       1       1       1       1       0       1       0 2
## 2          1       1       1       1       1       0       1       0 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## 1          1       1       1       1       0       1       0       1 2
## generate dummy of missing variable to identify potential helper variables
datenLV$missing_SSkMa_a <- ifelse(test = is.na(datenLV$SSkMa_a), yes = 1, no = 0)
helpervars <- c("wle_lesen", "wle_hoeren", "SSkDe") # include normally many more
for(v in helpervars){
  tmp <- t.test(datenLV[[v]] ~ datenLV$missing_SSkMa_a)
  if(tmp$p.value < .05){
    print(v)
    print(tmp)
  }
}
## [1] "wle_lesen"
## 
##  Welch Two Sample t-test
## 
## data:  datenLV[[v]] by datenLV$missing_SSkMa_a
## t = 4.0297, df = 84.449, p-value = 0.0001217
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.3177873 0.9369377
## sample estimates:
## mean in group 0 mean in group 1 
##       0.1107931      -0.5165693 
## 
## [1] "wle_hoeren"
## 
##  Welch Two Sample t-test
## 
## data:  datenLV[[v]] by datenLV$missing_SSkMa_a
## t = 4.4992, df = 83.294, p-value = 2.193e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.4094216 1.0581596
## sample estimates:
## mean in group 0 mean in group 1 
##       0.1294574      -0.6043332
## overall missing for each single variable
round(x = sort(x = colSums(x = is.na(datenLV), na.rm = TRUE), decreasing = TRUE) / nrow(datenLV) * 100, digits = 2)
##          EHisei           Emigr     EHisced_akt           EDezh        SBezMs_b 
##           20.70           18.17           16.14           15.57            6.62 
##        SBezMs_c        SBezMs_d        SBezMs_a          SBezMs        tr_NotDe 
##            5.76            5.72            5.46            4.83            4.36 
##        tr_NotMa         SSkDe_c         SSkDe_d         SSkDe_b         SSkMa_c 
##            4.23            4.19            4.09            4.03            4.03 
##         SSkMa_b         SSkMa_d          SLesZt           SSkDe         SSkDe_a 
##            3.83            3.39            3.23            3.16            3.06 
##           SSkMa         SSkMa_a        SBuecher          tr_age        tr_Wdh_r 
##            2.83            2.73            2.33            0.23            0.23 
##       idsch_FDZ      idstud_FDZ          tr_sex       wle_lesen      wle_hoeren 
##            0.00            0.00            0.00            0.00            0.00 
##       wle_mathe   schoolEconDis schoolMiganteil missing_SSkMa_a 
##            0.00            0.00            0.00            0.00

Es ist zentral fehlende Daten zu ersetzen bzw. modellbasiert zu schätzen. Die zwei modernsten Ansätze, um fehlende Daten zu ersetzen sind:

  • multiple imputation, introductory book: https://stefvanbuuren.name/fimd/, Grund, Lüdtke, and Robitzsch (2018)
  • full information maximum likelihood (aktuell in Mplus, aber nicht in R implementiert)

1.4 outlier analysis

Es wird unterschieden in uni- und multivariate Ausreißer, da structural equation modelling / CFA multivariate Verfahren sind (mehrere UVs und AVs), ist es notwendig die Daten auf multivariate Ausreißer zu kontrollieren. Dafür eignet sich die Mahalanobis Distance:

## exemplify Mahalanobis Distance
sigma <- matrix(c(4,1,2,1,5,4,2,4,6), ncol = 3)
cov2cor(sigma)
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.2236068 0.4082483
## [2,] 0.2236068 1.0000000 0.7302967
## [3,] 0.4082483 0.7302967 1.0000000
means <- c(0, 0, 0)
set.seed(42)
n <- 1000
x <- rmvnorm(n = n, mean = means, sigma = sigma)
d <- data.frame(x)
p4 <- plot_ly(d, x = ~ X1, y = ~ X2, z = ~ X3,
              marker = list(color = ~ X2,
                            showscale = TRUE)) %>%
  add_markers()

p4
## identify multivariate outliers
d$mahal <- mahalanobis(d, colMeans(d), cov(d))
d$p_mahal <- pchisq(d$mahal, df=2, lower.tail=FALSE)
d[d$p_mahal < .001, ]
##            X1        X2        X3    mahal      p_mahal
## 274  5.759481 -4.929943 -1.450310 16.41275 0.0002729077
## 330  7.398060  3.344539  4.741212 14.50380 0.0007088271
## 980 -6.295530 -5.365858 -4.367558 14.17879 0.0008339001
datenLV$mahal_SSkMa <- mahalanobis(datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_")], colMeans(datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_")], na.rm = TRUE), cov(datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_")], use = "pairwise"))
datenLV$p_mahal_SSkMa <- pchisq(datenLV$mahal_SSkMa, df=3, lower.tail=FALSE)

## identify multivariate outliers
head(datenLV[datenLV$p_mahal_SSkMa < .001  & !is.na(datenLV$p_mahal_SSkMa), c("SSkMa_a", "SSkMa_b", "SSkMa_c", "SSkMa_d", "mahal_SSkMa", "p_mahal_SSkMa")])
##    SSkMa_a SSkMa_b SSkMa_c SSkMa_d mahal_SSkMa p_mahal_SSkMa
## 8        1       1       4       1    22.80886  4.426227e-05
## 9        1       3       1       4    29.61531  1.662648e-06
## 15       1       1       4       4    17.85799  4.705288e-04
## 25       4       4       1       4    22.55417  5.001386e-05
## 46       4       1       1       4    28.13860  3.396688e-06
## 58       2       4       1       1    17.17129  6.516648e-04
sum(datenLV$p_mahal_SSkMa < .001, na.rm = TRUE)
## [1] 112
datenLV$intravariability_SSkMa <- apply(datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_")], MARGIN=1, FUN = sd, na.rm=TRUE)
## identify insufficient item responding using variability of answering patterns
head(datenLV[datenLV$intravariability_SSkMa == 0  & !is.na(datenLV$intravariability_SSkMa), c("SSkMa_a", "SSkMa_b", "SSkMa_c", "SSkMa_d", "mahal_SSkMa", "p_mahal_SSkMa")])
##    SSkMa_a SSkMa_b SSkMa_c SSkMa_d mahal_SSkMa p_mahal_SSkMa
## 4        4       4       4       4   1.3060837      0.727689
## 5        3       3       3       3   0.2810342      0.963555
## 6        4       4       4       4   1.3060837      0.727689
## 7        4       4       4       4   1.3060837      0.727689
## 11       4       4       4       4   1.3060837      0.727689
## 13       3       3       3       3   0.2810342      0.963555
sum(datenLV$intravariability_SSkMa == 0, na.rm = TRUE)
## [1] 1049

2 [simulation study: standardized residuals, reliability]

Using the R-Package simstudy it is possible to generate all kinds of data:

I have generated a data set with 3 items (y1-y3) and a data set with 7 items (m1-m7) for different sample sizes. The variables latentvar and errorvar are unknown and for example important in the context of classical test theory as these correspond to the true and error variance):

##       varname   formula     variance   dist     link
##  1: latentvar        20          0.5 normal identity
##  2:  errorvar         4          0.5 normal identity
##  3:        y1 latentvar errorvar / 4 normal identity
##  4:        y2 latentvar errorvar / 4 normal identity
##  5:        y3 latentvar errorvar / 4 normal identity
##  6:        m1 latentvar errorvar / 4 normal identity
##  7:        m2 latentvar errorvar / 4 normal identity
##  8:        m3 latentvar errorvar / 4 normal identity
##  9:        m4 latentvar errorvar / 4 normal identity
## 10:        m5 latentvar errorvar / 4 normal identity
## 11:        m6 latentvar errorvar / 4 normal identity
## 12:        m7 latentvar errorvar / 4 normal identity
set.seed(111)
dt_50 <- genData(50, def); dt_50 <- as.data.frame(dt_50)
dt_200 <- genData(200, def); dt_200 <- as.data.frame(dt_200)
dt_500 <- genData(500, def); dt_500 <- as.data.frame(dt_500)
dt_100000 <- genData(100000, def); dt_100000 <- as.data.frame(dt_100000)
round(x = cor(dt_50[, str_subset(string = colnames(dt_50), pattern = "m")]), digits = 2)
##      m1   m2   m3   m4   m5   m6   m7
## m1 1.00 0.45 0.39 0.34 0.35 0.43 0.24
## m2 0.45 1.00 0.52 0.34 0.56 0.40 0.49
## m3 0.39 0.52 1.00 0.25 0.44 0.39 0.57
## m4 0.34 0.34 0.25 1.00 0.19 0.30 0.35
## m5 0.35 0.56 0.44 0.19 1.00 0.20 0.37
## m6 0.43 0.40 0.39 0.30 0.20 1.00 0.49
## m7 0.24 0.49 0.57 0.35 0.37 0.49 1.00
round(x = cor(dt_100000[, str_subset(string = colnames(dt_100000), pattern = "m")]), digits = 2)
##      m1   m2   m3   m4   m5   m6   m7
## m1 1.00 0.34 0.33 0.34 0.33 0.33 0.34
## m2 0.34 1.00 0.33 0.34 0.34 0.34 0.34
## m3 0.33 0.33 1.00 0.33 0.33 0.33 0.34
## m4 0.34 0.34 0.33 1.00 0.33 0.34 0.34
## m5 0.33 0.34 0.33 0.33 1.00 0.33 0.33
## m6 0.33 0.34 0.33 0.34 0.33 1.00 0.34
## m7 0.34 0.34 0.34 0.34 0.33 0.34 1.00
round(x = cor(dt_50[, str_subset(string = colnames(dt_50), pattern = "m")]), digits = 2) - round(x = cor(dt_100000[, str_subset(string = colnames(dt_100000), pattern = "m")]), digits = 2)
##       m1   m2    m3    m4    m5    m6    m7
## m1  0.00 0.11  0.06  0.00  0.02  0.10 -0.10
## m2  0.11 0.00  0.19  0.00  0.22  0.06  0.15
## m3  0.06 0.19  0.00 -0.08  0.11  0.06  0.23
## m4  0.00 0.00 -0.08  0.00 -0.14 -0.04  0.01
## m5  0.02 0.22  0.11 -0.14  0.00 -0.13  0.04
## m6  0.10 0.06  0.06 -0.04 -0.13  0.00  0.15
## m7 -0.10 0.15  0.23  0.01  0.04  0.15  0.00
sd(dt_50$m1) / sqrt(x = length(dt_50$m1)) 
## [1] 0.18367
sd(dt_100000$m1) / sqrt(x = length(dt_100000$m1)) 
## [1] 0.003876399
psych::alpha(cor(dt_50[, str_subset(string = colnames(dt_50), pattern = "m")]))$total
##  raw_alpha std.alpha   G6(smc) average_r      S/N  median_r
##  0.8132445 0.8132445 0.8146988 0.3835094 4.354593 0.3878958
psych::alpha(cor(dt_200[, str_subset(string = colnames(dt_200), pattern = "m")]))$total
##  raw_alpha std.alpha   G6(smc) average_r      S/N  median_r
##  0.7631399 0.7631399 0.7410928 0.3151959 3.221902 0.3114825
psych::alpha(cor(dt_100000[, str_subset(string = colnames(dt_100000), pattern = "m")]))$total
##  raw_alpha std.alpha   G6(smc) average_r      S/N  median_r
##  0.7781921 0.7781921 0.7504623 0.3338666 3.508406 0.3353357
psych::alpha(cor(dt_50[, str_subset(string = colnames(dt_50), pattern = "y")]))$total
##  raw_alpha std.alpha   G6(smc) average_r     S/N  median_r
##  0.6202657 0.6202657 0.5319161 0.3525301 1.63342 0.3163457
psych::alpha(cor(dt_200[, str_subset(string = colnames(dt_200), pattern = "y")]))$total
##  raw_alpha std.alpha   G6(smc) average_r      S/N  median_r
##  0.5365776 0.5365776 0.4427281 0.2784747 1.157858 0.3205137
psych::alpha(cor(dt_100000[, str_subset(string = colnames(dt_100000), pattern = "y")]))$total
##  raw_alpha std.alpha   G6(smc) average_r      S/N  median_r
##  0.5972012 0.5972012 0.4971001 0.3307499 1.482629 0.3324383
psych::omega(m = dt_100000[, str_subset(string = colnames(dt_100000), pattern = "y")], nfactors = 1, plot = FALSE)
## Loading required namespace: GPArotation
## Omega_h for 1 factor is not meaningful, just omega_t
## Warning in schmid(m, nfactors, fm, digits, rotate = rotate, n.obs = n.obs, :
## Omega_h and Omega_asymptotic are not meaningful with one factor
## Warning in cov2cor(t(w) %*% r %*% w): diag(.) had 0 or NA entries; non-finite
## result is doubtful
## Omega 
## Call: omegah(m = m, nfactors = nfactors, fm = fm, key = key, flip = flip, 
##     digits = digits, title = title, sl = sl, labels = labels, 
##     plot = plot, n.obs = n.obs, rotate = rotate, Phi = Phi, option = option, 
##     covar = covar)
## Alpha:                 0.6 
## G.6:                   0.5 
## Omega Hierarchical:    0.6 
## Omega H asymptotic:    1 
## Omega Total            0.6 
## 
## Schmid Leiman Factor loadings greater than  0.2 
##       g  F1*   h2   u2 p2
## y1 0.58      0.34 0.66  1
## y2 0.57      0.33 0.67  1
## y3 0.57      0.33 0.67  1
## 
## With eigenvalues of:
##    g  F1* 
## 0.99 0.00 
## 
## general/max  Inf   max/min =   NaN
## mean percent general =  1    with sd =  0 and cv of  0 
## Explained Common Variance of the general factor =  1 
## 
## The degrees of freedom are 0  and the fit is  0 
## The number of observations was  100000  with Chi Square =  0  with prob <  NA
## The root mean square of the residuals is  0 
## The df corrected root mean square of the residuals is  NA
## 
## Compare this with the adequacy of just a general factor and no group factors
## The degrees of freedom for just the general factor are 0  and the fit is  0 
## The number of observations was  100000  with Chi Square =  0  with prob <  NA
## The root mean square of the residuals is  0 
## The df corrected root mean square of the residuals is  NA 
## 
## Measures of factor score adequacy             
##                                                  g F1*
## Correlation of scores with factors            0.77   0
## Multiple R square of scores with factors      0.60   0
## Minimum correlation of factor score estimates 0.19  -1
## 
##  Total, General and Subset omega for each subset
##                                                 g F1*
## Omega total for total scores and subscales    0.6 0.6
## Omega general for total scores and subscales  0.6 0.6
## Omega group for total scores and subscales    0.0 0.0

Zusammenfassung:

3 developing a questionnaire scale

3.1 descreptive analysis using classical test theory

orientiert sich an Buchkapitel 7, 13 in Moosbrugger and Kelava (2020)

\(y_i = \tau_i + \epsilon_i\), aus der Messfehlertheorie folgt die Definition der Reliabilität: \(Rel(Y) = \frac{Var(T)}{Var(T) + Var(E)}\)

\(E(y_i) = E(\tau_i) + E(\epsilon_i)\)

\(E(y_i) = E(\tau_i) + 0\)

über mehrere Items einer Skala lässt sich ein Punktschätzer für den wahren Wert \(\tau_i\) wie folgt berechnen als Summenscore: \(Y = \sum_{i=1}^p y_i\) oder besser interpretierbar als Personmittelwertmittelwert: \(\bar{Y} = \frac{\sum_{i=1}^p y_i}{n}\) ! vorläufige Testwertermittlung (Eindimensionalität, tau-äquivalenten Messmodells muss an sich gegeben sein)

3.1.1 Schwierigkeitsindex

\(P_i = \frac{\sum_{v=1}^n y_{vi}}{n*max(y_i)} *100\)

folgende Zahlen geben die Leichtigkeit des Items an:

datenLV[,str_subset(string = colnames(datenLV), pattern = "^SBezMs_")] <- datenLV[,str_subset(string = colnames(datenLV), pattern = "^SBezMs_")] - 1
datenLV$failitem <- rbinom(n = nrow(datenLV), size = 3, prob = .95)
head(datenLV[, c(str_subset(string = colnames(datenLV), pattern = "^SBezMs_"), "failitem")])
##   SBezMs_a SBezMs_b SBezMs_c SBezMs_d failitem
## 1        3        2        3        3        2
## 2        2        2        3        3        3
## 3        3        0        3        2        3
## 4        0        3        0        0        3
## 5        3        2        0        0        3
## 6        3        3        3        3        3
sum(datenLV$SBezMs_a, na.rm = TRUE) / (sum(!is.na(datenLV$SBezMs_a)) * max(datenLV$SBezMs_a, na.rm = TRUE)) * 100
## [1] 78.82201
sum(datenLV$SBezMs_b , na.rm = TRUE) / (sum(!is.na(datenLV$SBezMs_b)) * max(datenLV$SBezMs_b, na.rm = TRUE)) * 100
## [1] 70.34925
sum(datenLV$SBezMs_c , na.rm = TRUE) / (sum(!is.na(datenLV$SBezMs_c)) * max(datenLV$SBezMs_c, na.rm = TRUE)) * 100 
## [1] 78.88418
sum(datenLV$SBezMs_d, na.rm = TRUE) / (sum(!is.na(datenLV$SBezMs_d)) * max(datenLV$SBezMs_d, na.rm = TRUE)) * 100
## [1] 83.55101
sum(datenLV$failitem, na.rm = TRUE) / (sum(!is.na(datenLV$failitem)) * max(datenLV$failitem, na.rm = TRUE)) * 100
## [1] 95.50749
datenLV[,str_subset(string = colnames(datenLV), pattern = "^SBezMs_")] <- datenLV[,str_subset(string = colnames(datenLV), pattern = "^SBezMs_")] + 1

3.1.2 Itemvarianz

\(Var(y_i) = \frac{\sum_{v=1}^n (y_{vi} - \bar{y_i})^2}{n}\)

sum((datenLV$SBezMs_a - mean(datenLV$SBezMs_a, na.rm = TRUE))^2, na.rm = TRUE) / sum(!is.na(datenLV$SBezMs_a)) # = var(datenLV$SBezMs_a , na.rm = T)
## [1] 0.5393213
sum((datenLV$SBezMs_b - mean(datenLV$SBezMs_b, na.rm = TRUE))^2, na.rm = TRUE) / sum(!is.na(datenLV$SBezMs_b))
## [1] 0.8131689
sum((datenLV$SBezMs_c - mean(datenLV$SBezMs_c, na.rm = TRUE))^2, na.rm = TRUE) / sum(!is.na(datenLV$SBezMs_c)) 
## [1] 0.8543597
sum((datenLV$SBezMs_d - mean(datenLV$SBezMs_d, na.rm = TRUE))^2, na.rm = TRUE) / sum(!is.na(datenLV$SBezMs_d))
## [1] 0.6580054
sum((datenLV$failitem - mean(datenLV$failitem, na.rm = TRUE))^2, na.rm = TRUE) / sum(!is.na(datenLV$failitem))
## [1] 0.1325844

3.1.3 Trennschärfe

part-whole korrigierte Trennschärfe \(r_{it(i)}\): \(r_{it(i)} = r_{(y_i, y(i))}\)

cor(datenLV$SBezMs_a, rowSums(datenLV[, c("SBezMs_b", "SBezMs_c", "SBezMs_d")], na.rm = TRUE), use = "complete")
## [1] 0.5407678
cor(datenLV$SBezMs_b, rowSums(datenLV[, c("SBezMs_a", "SBezMs_c", "SBezMs_d")], na.rm = TRUE), use = "complete")
## [1] 0.4083451
cor(datenLV$SBezMs_c, rowSums(datenLV[, c("SBezMs_a", "SBezMs_b", "SBezMs_d")], na.rm = TRUE), use = "complete")
## [1] 0.4591703
cor(datenLV$SBezMs_d, rowSums(datenLV[, c("SBezMs_a", "SBezMs_b", "SBezMs_c")], na.rm = TRUE), use = "complete")
## [1] 0.3990633
cor(datenLV$failitem, rowSums(datenLV[, c("SBezMs_a", "SBezMs_b", "SBezMs_c", "SBezMs_d")], na.rm = TRUE), use = "complete")
## [1] 0.006453734

3.1.4 Testwertverteilung

orientiert sich an Buchkapitel 8 in Moosbrugger and Kelava (2020)

## liegt bereits in Daten vor
cor(rowMeans(x = datenLV[,str_subset(string = colnames(datenLV), pattern = "^SBezMs_")]), datenLV$SBezMs, use = "complete")
## [1] 1
hist(datenLV$SBezMs, freq = FALSE)
abline(v = mean(datenLV$SBezMs, na.rm = TRUE))
lines(density(datenLV$SBezMs[!is.na(datenLV$SBezMs)]), col="red") # empirical density
lines(seq(0, 5, by=.1), dnorm(seq(0, 5, by=.1),
      mean(datenLV$SBezMs, na.rm = TRUE), sd(datenLV$SBezMs, na.rm = TRUE)), col="blue") # normal density

sd(x = datenLV$SBezMs, na.rm = TRUE)
## [1] 0.6099304
moments::skewness(x = datenLV$SBezMs, na.rm = TRUE)
## [1] -1.013992
moments::kurtosis(x = datenLV$SBezMs, na.rm = TRUE) - 3 # = SPSS output
## [1] 0.7847189
shapiro.test(x = datenLV$SBezMs)
## 
##  Shapiro-Wilk normality test
## 
## data:  datenLV$SBezMs
## W = 0.89501, p-value < 2.2e-16

für ein normorientierten Vergleich bietet sich eine z-Standardisierung \(\frac{Y_v - \bar{Y}}{SD(Y)}\) an:

datenLV$Zstand_SBezMs <- scale(x = datenLV$SBezMs, center = TRUE, scale = TRUE)
hist(datenLV$Zstand_SBezMs, freq = FALSE)
abline(v = mean(datenLV$Zstand_SBezMs, na.rm = TRUE))
lines(density(datenLV$Zstand_SBezMs[!is.na(datenLV$Zstand_SBezMs)]), col="red")
lines(seq(-4, 4, by=.1), dnorm(seq(-4, 4, by=.1),
      mean(datenLV$Zstand_SBezMs, na.rm = TRUE), sd(datenLV$Zstand_SBezMs, na.rm = TRUE)), col="blue")

3.2 exploratory factor analysis

induktive Methode

an sich gehört zur KTT Testung auf Messinvarianz über die klassischen Testmodelle, jedoch muss für diese Eindimensionaltität gegeben sein, hierfür eignet sich eine sogenannte EFA

Verwendung des psych Paketes in R (siehe http://personality-project.org/r/psych/HowTo/factor.pdf), Alternativ eignet sich auch das Statistikprogramm JASP für EFA / CFA: (https://jasp-stats.org/)

einführende Artikel in EFA: Costello and Osborne (2005), Mvududu and Sink (2013) (Anmerkung: es gibt Mischformen zwischen EFA und CFA, wie beispielsweise ESEM: Marsh et al. (2014))

! wichtig es sollte keine principal component analysis gerechnet werden (Relikt der Vergangenheit, Grundprinzipien mit EFA gleich), da hier keine Varianzzerlegung stattfindet.

Ziele der explorativen Faktorenanalyse sind

  • die Reduktion der Dimension der Kovarianz- bzw. Korrelationsmatrix
  • die Identifizierung von latenten Variablen (z.B. über die Hauptachsenanalyse (Principal Axes Analyses)) und
  • die Ausdifferenzierung eines komplexen Merkmalsbereichs in homogene Teilbereiche, d.h. die Variablen werden so gruppiert, dass sie innerhalb der Gruppe möglichst hoch korreliert sind (homogen) und die Gruppen der Variablen zueinander möglichst heterogen sind. Hier wird das gleiche Ziel wie mit einer Clusteranalyse (Latente Klassenanalyse) verfolgt

EFA läuft in vier Schritten ab:

  1. vorbetrachtende Tests
  2. die Wahl der Extraktionsmethode
  3. die Wahl eines Abbruchkriteriums und zuletzt
  4. die Wahl der Rotationsmethode

zur eigenen Interpretation der Ergebnisse siehe Blog von Michael Clark: https://m-clark.github.io/posts/2020-04-10-psych-explained/

psych::corPlot(r = datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_|^SSkDe_|^SBezMs_")])

## not accounting for the non-normal / skewed data
efa1 = fa(r = datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_|^SSkDe_|^SBezMs_")], nfactors = 3, rotate = "oblimin")
fa.diagram(efa1)

efa1
## Factor Analysis using method =  minres
## Call: fa(r = datenLV[, str_subset(string = colnames(datenLV), pattern = "^SSkMa_|^SSkDe_|^SBezMs_")], 
##     nfactors = 3, rotate = "oblimin")
## Standardized loadings (pattern matrix) based upon correlation matrix
##            MR1   MR3   MR2   h2   u2 com
## SSkDe_a  -0.07  0.70  0.04 0.47 0.53   1
## SSkDe_b   0.04  0.51  0.03 0.28 0.72   1
## SSkDe_c   0.09  0.57 -0.01 0.37 0.63   1
## SSkDe_d   0.02  0.75 -0.03 0.56 0.44   1
## SSkMa_a   0.80 -0.02  0.01 0.63 0.37   1
## SSkMa_b   0.53  0.07  0.03 0.32 0.68   1
## SSkMa_c   0.69  0.06  0.00 0.51 0.49   1
## SSkMa_d   0.84 -0.03 -0.01 0.70 0.30   1
## SBezMs_a  0.00 -0.01  0.76 0.57 0.43   1
## SBezMs_b -0.04  0.05  0.53 0.29 0.71   1
## SBezMs_c  0.04 -0.02  0.59 0.35 0.65   1
## SBezMs_d  0.01  0.00  0.53 0.28 0.72   1
## 
##                        MR1  MR3  MR2
## SS loadings           2.16 1.68 1.50
## Proportion Var        0.18 0.14 0.13
## Cumulative Var        0.18 0.32 0.44
## Proportion Explained  0.40 0.31 0.28
## Cumulative Proportion 0.40 0.72 1.00
## 
##  With factor correlations of 
##      MR1  MR3  MR2
## MR1 1.00 0.36 0.20
## MR3 0.36 1.00 0.24
## MR2 0.20 0.24 1.00
## 
## Mean item complexity =  1
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  66  and the objective function was  3.47 with Chi Square of  10398
## The degrees of freedom for the model are 33  and the objective function was  0.43 
## 
## The root mean square of the residuals (RMSR) is  0.05 
## The df corrected root mean square of the residuals is  0.07 
## 
## The harmonic number of observations is  2811 with the empirical chi square  868.15  with prob <  4.6e-161 
## The total number of observations was  3005  with Likelihood Chi Square =  1288.18  with prob <  1.3e-249 
## 
## Tucker Lewis Index of factoring reliability =  0.757
## RMSEA index =  0.113  and the 90 % confidence intervals are  0.107 0.118
## BIC =  1023.92
## Fit based upon off diagonal values = 0.97
## Measures of factor score adequacy             
##                                                    MR1  MR3  MR2
## Correlation of (regression) scores with factors   0.92 0.88 0.86
## Multiple R square of scores with factors          0.85 0.77 0.73
## Minimum correlation of possible factor scores     0.70 0.53 0.47
### accounting partly for the non-normal / skewed data using choric correlations (limited information approach)
efa2choric <- fa(r = datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_|^SSkDe_|^SBezMs_")], nfactors = 3, rotate = "oblimin", fm = "wls", max.iter = 500, cor = "poly", scores = "Bartlett")
efa2choric
## Factor Analysis using method =  wls
## Call: fa(r = datenLV[, str_subset(string = colnames(datenLV), pattern = "^SSkMa_|^SSkDe_|^SBezMs_")], 
##     nfactors = 3, rotate = "oblimin", scores = "Bartlett", max.iter = 500, 
##     fm = "wls", cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
##           WLS1  WLS3  WLS2   h2   u2 com
## SSkDe_a  -0.09  0.76  0.04 0.55 0.45 1.0
## SSkDe_b   0.04  0.57  0.03 0.36 0.64 1.0
## SSkDe_c   0.10  0.66 -0.01 0.49 0.51 1.1
## SSkDe_d   0.03  0.82 -0.03 0.67 0.33 1.0
## SSkMa_a   0.86 -0.03  0.01 0.72 0.28 1.0
## SSkMa_b   0.62  0.06  0.02 0.43 0.57 1.0
## SSkMa_c   0.77  0.07  0.00 0.64 0.36 1.0
## SSkMa_d   0.90 -0.02  0.00 0.80 0.20 1.0
## SBezMs_a  0.00 -0.01  0.83 0.68 0.32 1.0
## SBezMs_b -0.05  0.05  0.59 0.36 0.64 1.0
## SBezMs_c  0.04 -0.02  0.69 0.48 0.52 1.0
## SBezMs_d  0.00  0.01  0.63 0.40 0.60 1.0
## 
##                       WLS1 WLS3 WLS2
## SS loadings           2.60 2.06 1.92
## Proportion Var        0.22 0.17 0.16
## Cumulative Var        0.22 0.39 0.55
## Proportion Explained  0.39 0.31 0.29
## Cumulative Proportion 0.39 0.71 1.00
## 
##  With factor correlations of 
##      WLS1 WLS3 WLS2
## WLS1 1.00 0.41 0.23
## WLS3 0.41 1.00 0.27
## WLS2 0.23 0.27 1.00
## 
## Mean item complexity =  1
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  66  and the objective function was  5.63 with Chi Square of  16873.08
## The degrees of freedom for the model are 33  and the objective function was  0.89 
## 
## The root mean square of the residuals (RMSR) is  0.06 
## The df corrected root mean square of the residuals is  0.08 
## 
## The harmonic number of observations is  2811 with the empirical chi square  1148.38  with prob <  5e-220 
## The total number of observations was  3005  with Likelihood Chi Square =  2678.78  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.685
## RMSEA index =  0.163  and the 90 % confidence intervals are  0.158 0.169
## BIC =  2414.51
## Fit based upon off diagonal values = 0.97
## Measures of factor score adequacy             
##                                                   WLS1 WLS3 WLS2
## Correlation of (regression) scores with factors   0.95 0.91 0.90
## Multiple R square of scores with factors          0.91 0.84 0.81
## Minimum correlation of possible factor scores     0.81 0.67 0.62
### model based reliability score
omega(datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_|^SSkDe_|^SBezMs_")])

## Omega 
## Call: omegah(m = m, nfactors = nfactors, fm = fm, key = key, flip = flip, 
##     digits = digits, title = title, sl = sl, labels = labels, 
##     plot = plot, n.obs = n.obs, rotate = rotate, Phi = Phi, option = option, 
##     covar = covar)
## Alpha:                 0.76 
## G.6:                   0.81 
## Omega Hierarchical:    0.45 
## Omega H asymptotic:    0.54 
## Omega Total            0.83 
## 
## Schmid Leiman Factor loadings greater than  0.2 
##             g   F1*   F2*   F3*   h2   u2   p2
## SSkDe_a  0.43        0.53       0.47 0.53 0.40
## SSkDe_b  0.37        0.38       0.28 0.72 0.48
## SSkDe_c  0.42        0.43       0.37 0.63 0.48
## SSkDe_d  0.49        0.56       0.56 0.44 0.43
## SSkMa_a  0.43  0.67             0.63 0.37 0.29
## SSkMa_b  0.35  0.44             0.32 0.68 0.38
## SSkMa_c  0.42  0.57             0.51 0.49 0.35
## SSkMa_d  0.45  0.70             0.70 0.30 0.28
## SBezMs_a 0.27              0.71 0.57 0.43 0.13
## SBezMs_b 0.20              0.50 0.29 0.71 0.14
## SBezMs_c 0.22              0.55 0.35 0.65 0.14
## SBezMs_d 0.20              0.49 0.28 0.72 0.14
## 
## With eigenvalues of:
##    g  F1*  F2*  F3* 
## 1.63 1.48 0.93 1.30 
## 
## general/max  1.1   max/min =   1.59
## mean percent general =  0.3    with sd =  0.14 and cv of  0.45 
## Explained Common Variance of the general factor =  0.31 
## 
## The degrees of freedom are 33  and the fit is  0.43 
## The number of observations was  3005  with Chi Square =  1288.18  with prob <  1.3e-249
## The root mean square of the residuals is  0.05 
## The df corrected root mean square of the residuals is  0.07
## RMSEA index =  0.113  and the 10 % confidence intervals are  0.107 0.118
## BIC =  1023.92
## 
## Compare this with the adequacy of just a general factor and no group factors
## The degrees of freedom for just the general factor are 54  and the fit is  2.02 
## The number of observations was  3005  with Chi Square =  6062.27  with prob <  0
## The root mean square of the residuals is  0.17 
## The df corrected root mean square of the residuals is  0.19 
## 
## RMSEA index =  0.192  and the 10 % confidence intervals are  0.188 0.197
## BIC =  5629.84 
## 
## Measures of factor score adequacy             
##                                                   g  F1*   F2*  F3*
## Correlation of scores with factors             0.70 0.81  0.69 0.81
## Multiple R square of scores with factors       0.49 0.65  0.48 0.66
## Minimum correlation of factor score estimates -0.03 0.30 -0.05 0.32
## 
##  Total, General and Subset omega for each subset
##                                                  g  F1*  F2*  F3*
## Omega total for total scores and subscales    0.83 0.82 0.74 0.70
## Omega general for total scores and subscales  0.45 0.26 0.33 0.10
## Omega group for total scores and subscales    0.36 0.56 0.41 0.61

Wenn die Anzahl der zu bestimmenden Faktoren unklar ist bietet sich die Verwendung von Scree plots an:

efa3 <- fa.parallel(x = datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_|^SSkDe_|^SBezMs_")], fa = "fa",n.iter=50)

## Parallel analysis suggests that the number of factors =  4  and the number of components =  NA
efa3
## Call: fa.parallel(x = datenLV[, str_subset(string = colnames(datenLV), 
##     pattern = "^SSkMa_|^SSkDe_|^SBezMs_")], fa = "fa", n.iter = 50)
## Parallel analysis suggests that the number of factors =  4  and the number of components =  NA 
## 
##  Eigen Values of 
## 
##  eigen values of factors
##  [1]  2.73  1.05  0.83  0.29 -0.10 -0.15 -0.20 -0.22 -0.31 -0.36 -0.42 -0.42
## 
##  eigen values of simulated factors
##  [1]  0.31  0.08  0.06  0.05  0.03  0.02  0.00 -0.01 -0.03 -0.05 -0.06 -0.09
## 
##  eigen values of components 
##  [1] 3.43 1.89 1.60 1.04 0.78 0.62 0.61 0.49 0.47 0.45 0.33 0.28
## 
##  eigen values of simulated components
## [1] NA

3.2.1 classical test models / measurment invariance

Im Folgenden wollen wir die Items zu dem Matheselbstkonzept genauer analysieren - ohne die Testung von tau-äquivalentem Modell, sowie Eindimensionalität berechnen wir vorläufig nur McDonald’s Omega:

psych::omega(datenLV[,str_subset(string = colnames(datenLV), pattern = "^SSkMa_")], nfactors = 1)
## Omega_h for 1 factor is not meaningful, just omega_t
## Warning in schmid(m, nfactors, fm, digits, rotate = rotate, n.obs = n.obs, :
## Omega_h and Omega_asymptotic are not meaningful with one factor
## Omega 
## Call: omegah(m = m, nfactors = nfactors, fm = fm, key = key, flip = flip, 
##     digits = digits, title = title, sl = sl, labels = labels, 
##     plot = plot, n.obs = n.obs, rotate = rotate, Phi = Phi, option = option, 
##     covar = covar)
## Alpha:                 0.81 
## G.6:                   0.78 
## Omega Hierarchical:    0.82 
## Omega H asymptotic:    1 
## Omega Total            0.82 
## 
## Schmid Leiman Factor loadings greater than  0.2 
##            g  F1*   h2   u2 p2
## SSkMa_a 0.81      0.65 0.35  1
## SSkMa_b 0.56      0.31 0.69  1
## SSkMa_c 0.70      0.49 0.51  1
## SSkMa_d 0.83      0.69 0.31  1
## 
## With eigenvalues of:
##   g F1* 
## 2.1 0.0 
## 
## general/max  3.862933e+16   max/min =   1
## mean percent general =  1    with sd =  0 and cv of  0 
## Explained Common Variance of the general factor =  1 
## 
## The degrees of freedom are 2  and the fit is  0.05 
## The number of observations was  3005  with Chi Square =  143.45  with prob <  7.1e-32
## The root mean square of the residuals is  0.04 
## The df corrected root mean square of the residuals is  0.07
## RMSEA index =  0.153  and the 10 % confidence intervals are  0.133 0.175
## BIC =  127.43
## 
## Compare this with the adequacy of just a general factor and no group factors
## The degrees of freedom for just the general factor are 2  and the fit is  0.05 
## The number of observations was  3005  with Chi Square =  143.45  with prob <  7.1e-32
## The root mean square of the residuals is  0.04 
## The df corrected root mean square of the residuals is  0.07 
## 
## RMSEA index =  0.153  and the 10 % confidence intervals are  0.133 0.175
## BIC =  127.43 
## 
## Measures of factor score adequacy             
##                                                  g F1*
## Correlation of scores with factors            0.92   0
## Multiple R square of scores with factors      0.85   0
## Minimum correlation of factor score estimates 0.69  -1
## 
##  Total, General and Subset omega for each subset
##                                                  g  F1*
## Omega total for total scores and subscales    0.82 0.82
## Omega general for total scores and subscales  0.82 0.82
## Omega group for total scores and subscales    0.00 0.00

Um auf Messinvarianz zu testen, müssen wir das Messmodell über eine sogenannte Modellsyntax eingeben, um darauf folgend das R Paket lavaan verwenden zu können:

Achtung: CFAs werden geschätzt mittels maximum likelihood (ML), weiter unten in Abschnitt CFA / SEM für die Daten besser geeignete Schätzmethode (jedoch ist ML hier zielführend da hiermit über den likelihood ratio test ein Modellvergleich gerechnet werden kann):

classical test models (Voraussetzung Reliabilitätsanalysen, Sparsamkeit des Modells; gleiches Prinzip wie Messinvarianz weiter unten)

cong.model <- '
SSmath =~ lam1*SSkMa_a + lam2*SSkMa_b + lam3*SSkMa_c + lam4*SSkMa_d

SSkMa_a ~~ var1*SSkMa_a
SSkMa_b ~~ var2*SSkMa_b
SSkMa_c ~~ var3*SSkMa_c
SSkMa_d ~~ var4*SSkMa_d

SSkMa_a ~ mean1*1
SSkMa_b ~ mean2*1
SSkMa_c ~ mean3*1
SSkMa_d ~ mean4*1
'

# identification: Fixed factor
cong.fit <-sem(cong.model, data = datenLV, std.lv = TRUE)
summary(cong.fit, standardized = TRUE, fit.measures=TRUE)
## lavaan 0.6-8 ended normally after 16 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
##                                                       
##                                                   Used       Total
##   Number of observations                          2843        3005
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                               124.430
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              4111.594
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.970
##   Tucker-Lewis Index (TLI)                       0.911
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -12967.449
##   Loglikelihood unrestricted model (H1)     -12905.234
##                                                       
##   Akaike (AIC)                               25958.898
##   Bayesian (BIC)                             26030.329
##   Sample-size adjusted Bayesian (BIC)        25992.201
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.147
##   90 Percent confidence interval - lower         0.125
##   90 Percent confidence interval - upper         0.169
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.028
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SSmath =~                                                             
##     SSkMa_a (lam1)    0.733    0.016   45.845    0.000    0.733    0.786
##     SSkMa_b (lam2)    0.603    0.020   29.672    0.000    0.603    0.554
##     SSkMa_c (lam3)    0.582    0.014   40.830    0.000    0.582    0.717
##     SSkMa_d (lam4)    0.674    0.013   49.973    0.000    0.674    0.841
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .SSkMa_a (men1)    3.177    0.017  181.586    0.000    3.177    3.406
##    .SSkMa_b (men2)    2.943    0.020  144.193    0.000    2.943    2.704
##    .SSkMa_c (men3)    3.319    0.015  217.876    0.000    3.319    4.086
##    .SSkMa_d (men4)    3.337    0.015  221.820    0.000    3.337    4.160
##     SSmath            0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .SSkMa_a (var1)    0.332    0.013   25.253    0.000    0.332    0.382
##    .SSkMa_b (var2)    0.820    0.024   34.594    0.000    0.820    0.693
##    .SSkMa_c (var3)    0.320    0.011   29.946    0.000    0.320    0.486
##    .SSkMa_d (var4)    0.189    0.009   19.857    0.000    0.189    0.293
##     SSmath            1.000                               1.000    1.000
semPlot::semPaths(object = cong.fit, what = "est")

tauequi.model <- '
SSmath =~ lam1*SSkMa_a + lam2*SSkMa_b + lam3*SSkMa_c + lam4*SSkMa_d

SSkMa_a ~~ var1*SSkMa_a
SSkMa_b ~~ var2*SSkMa_b
SSkMa_c ~~ var3*SSkMa_c
SSkMa_d ~~ var4*SSkMa_d

SSkMa_a ~ mean1*1
SSkMa_b ~ mean2*1
SSkMa_c ~ mean3*1
SSkMa_d ~ mean4*1

# fix variance of SSmath factor
SSmath ~~ 1*SSmath

# constraints
lam1 == lam2
lam2 == lam3
lam3 == lam4
'

# identification: Fixed factor
tauequi.fit <-sem(tauequi.model, data = datenLV, std.lv = TRUE)
summary(tauequi.fit, standardized = TRUE, fit.measures=TRUE)
## lavaan 0.6-8 ended normally after 12 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
##   Number of equality constraints                     3
##                                                       
##                                                   Used       Total
##   Number of observations                          2843        3005
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                               209.808
##   Degrees of freedom                                 5
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              4111.594
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.950
##   Tucker-Lewis Index (TLI)                       0.940
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -13010.138
##   Loglikelihood unrestricted model (H1)     -12905.234
##                                                       
##   Akaike (AIC)                               26038.276
##   Bayesian (BIC)                             26091.849
##   Sample-size adjusted Bayesian (BIC)        26063.253
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.120
##   90 Percent confidence interval - lower         0.106
##   90 Percent confidence interval - upper         0.134
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.065
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SSmath =~                                                             
##     SSkMa_a (lam1)    0.655    0.010   63.090    0.000    0.655    0.734
##     SSkMa_b (lam2)    0.655    0.010   63.090    0.000    0.655    0.586
##     SSkMa_c (lam3)    0.655    0.010   63.090    0.000    0.655    0.767
##     SSkMa_d (lam4)    0.655    0.010   63.090    0.000    0.655    0.830
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .SSkMa_a (men1)    3.177    0.017  189.961    0.000    3.177    3.563
##    .SSkMa_b (men2)    2.943    0.021  140.437    0.000    2.943    2.634
##    .SSkMa_c (men3)    3.319    0.016  207.225    0.000    3.319    3.886
##    .SSkMa_d (men4)    3.337    0.015  225.422    0.000    3.337    4.228
##     SSmath            0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .SSkMa_a (var1)    0.366    0.012   30.452    0.000    0.366    0.461
##    .SSkMa_b (var2)    0.820    0.024   34.536    0.000    0.820    0.657
##    .SSkMa_c (var3)    0.300    0.010   28.781    0.000    0.300    0.412
##    .SSkMa_d (var4)    0.194    0.008   23.975    0.000    0.194    0.312
##     SSmath            1.000                               1.000    1.000
## 
## Constraints:
##                                                |Slack|
##     lam1 - (lam2)                                0.000
##     lam2 - (lam3)                                0.000
##     lam3 - (lam4)                                0.000
anova(cong.fit, tauequi.fit) # LRT
## Chi-Squared Difference Test
## 
##             Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
## cong.fit     2 25959 26030 124.43                                  
## tauequi.fit  5 26038 26092 209.81     85.378       3  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit.stats <- rbind(fitmeasures(cong.fit, fit.measures = c("chisq", "df", "rmsea", "tli", "cfi", "aic")),
fitmeasures(tauequi.fit, fit.measures = c("chisq", "df", "rmsea", "tli", "cfi", "aic")))
rownames(fit.stats) <- c("configural", "weak invariance")
fit.stats
##                    chisq df     rmsea       tli       cfi      aic
## configural      124.4304  2 0.1467375 0.9105389 0.9701796 25958.90
## weak invariance 209.8084  5 0.1200330 0.9401377 0.9501148 26038.28
parallel.model <- '
SSmath =~ lam1*SSkMa_a + lam2*SSkMa_b + lam3*SSkMa_c + lam4*SSkMa_d

SSkMa_a ~~ var1*SSkMa_a
SSkMa_b ~~ var2*SSkMa_b
SSkMa_c ~~ var3*SSkMa_c
SSkMa_d ~~ var4*SSkMa_d

SSkMa_a ~ mean1*1
SSkMa_b ~ mean2*1
SSkMa_c ~ mean3*1
SSkMa_d ~ mean4*1

# fix variance of SSmath factor
SSmath ~~ 1*SSmath

# constraints
lam1 == lam2
lam2 == lam3
lam3 == lam4

var1 == var2
var2 == var3
var3 == var4
'

# identification: Fixed factor
parallel.fit <-sem(parallel.model, data = datenLV, std.lv = TRUE)
summary(parallel.fit, standardized = TRUE, fit.measures=TRUE)
## lavaan 0.6-8 ended normally after 4 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
##   Number of equality constraints                     6
##                                                       
##                                                   Used       Total
##   Number of observations                          2843        3005
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                              1157.170
##   Degrees of freedom                                 8
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              4111.594
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.720
##   Tucker-Lewis Index (TLI)                       0.790
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -13483.819
##   Loglikelihood unrestricted model (H1)     -12905.234
##                                                       
##   Akaike (AIC)                               26979.637
##   Bayesian (BIC)                             27015.353
##   Sample-size adjusted Bayesian (BIC)        26996.289
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.225
##   90 Percent confidence interval - lower         0.214
##   90 Percent confidence interval - upper         0.236
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.144
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SSmath =~                                                             
##     SSkMa_a (lam1)    0.648    0.011   59.955    0.000    0.648    0.708
##     SSkMa_b (lam2)    0.648    0.011   59.955    0.000    0.648    0.708
##     SSkMa_c (lam3)    0.648    0.011   59.955    0.000    0.648    0.708
##     SSkMa_d (lam4)    0.648    0.011   59.955    0.000    0.648    0.708
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .SSkMa_a (men1)    3.177    0.017  184.882    0.000    3.177    3.467
##    .SSkMa_b (men2)    2.943    0.017  171.290    0.000    2.943    3.213
##    .SSkMa_c (men3)    3.319    0.017  193.132    0.000    3.319    3.622
##    .SSkMa_d (men4)    3.337    0.017  194.196    0.000    3.337    3.642
##     SSmath            0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .SSkMa_a (var1)    0.419    0.006   65.303    0.000    0.419    0.499
##    .SSkMa_b (var2)    0.419    0.006   65.303    0.000    0.419    0.499
##    .SSkMa_c (var3)    0.419    0.006   65.303    0.000    0.419    0.499
##    .SSkMa_d (var4)    0.419    0.006   65.303    0.000    0.419    0.499
##     SSmath            1.000                               1.000    1.000
## 
## Constraints:
##                                                |Slack|
##     lam1 - (lam2)                                0.000
##     lam2 - (lam3)                                0.000
##     lam3 - (lam4)                                0.000
##     var1 - (var2)                                0.000
##     var2 - (var3)                                0.000
##     var3 - (var4)                                0.000
anova(cong.fit, tauequi.fit, parallel.fit) # LRT
## Chi-Squared Difference Test
## 
##              Df   AIC   BIC   Chisq Chisq diff Df diff Pr(>Chisq)    
## cong.fit      2 25959 26030  124.43                                  
## tauequi.fit   5 26038 26092  209.81      85.38       3  < 2.2e-16 ***
## parallel.fit  8 26980 27015 1157.17     947.36       3  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
semPlot::semPaths(object = parallel.fit, what = "est")

measurment invariance (longitudinal data, multi-group analysis)

zur eigenen Interpretation der Ergebnisse siehe: https://rstudio-pubs-static.s3.amazonaws.com/194879_192b64ad567743d392b559d650b95a3b.html

CFAmodel <- ' SSmath  =~ SSkMa_a  + SSkMa_b  + SSkMa_c + SSkMa_d'
fit <- cfa(CFAmodel, data=datenLV) # ! ML 
summary(fit, fit.measures=TRUE)
## lavaan 0.6-8 ended normally after 19 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
##                                                       
##                                                   Used       Total
##   Number of observations                          2843        3005
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                               124.430
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              4111.594
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.970
##   Tucker-Lewis Index (TLI)                       0.911
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -12967.449
##   Loglikelihood unrestricted model (H1)     -12905.234
##                                                       
##   Akaike (AIC)                               25950.898
##   Bayesian (BIC)                             25998.519
##   Sample-size adjusted Bayesian (BIC)        25973.100
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.147
##   90 Percent confidence interval - lower         0.125
##   90 Percent confidence interval - upper         0.169
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.033
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   SSmath =~                                           
##     SSkMa_a           1.000                           
##     SSkMa_b           0.823    0.029   28.041    0.000
##     SSkMa_c           0.794    0.022   36.756    0.000
##     SSkMa_d           0.920    0.023   40.728    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a           0.332    0.013   25.253    0.000
##    .SSkMa_b           0.820    0.024   34.594    0.000
##    .SSkMa_c           0.320    0.011   29.946    0.000
##    .SSkMa_d           0.189    0.009   19.857    0.000
##     SSmath            0.538    0.023   22.923    0.000
table(datenLV$Emigr)
## 
##     Mig keinMig 
##     493    1966
configural <- cfa(CFAmodel, data=datenLV, group = "Emigr")
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: group variable 'Emigr' contains missing values
summary(configural, fit.measures=TRUE)
## lavaan 0.6-8 ended normally after 35 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        24
##                                                       
##   Number of observations per group:               Used       Total
##     keinMig                                       1878        1966
##     Mig                                            465         493
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                98.189
##   Degrees of freedom                                 4
##   P-value (Chi-square)                           0.000
##   Test statistic for each group:
##     keinMig                                     72.077
##     Mig                                         26.112
## 
## Model Test Baseline Model:
## 
##   Test statistic                              3389.720
##   Degrees of freedom                                12
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.972
##   Tucker-Lewis Index (TLI)                       0.916
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -10511.368
##   Loglikelihood unrestricted model (H1)     -10462.274
##                                                       
##   Akaike (AIC)                               21070.736
##   Bayesian (BIC)                             21208.957
##   Sample-size adjusted Bayesian (BIC)        21132.704
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.142
##   90 Percent confidence interval - lower         0.118
##   90 Percent confidence interval - upper         0.167
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.027
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## 
## Group 1 [keinMig]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   SSmath =~                                           
##     SSkMa_a           1.000                           
##     SSkMa_b           0.865    0.037   23.226    0.000
##     SSkMa_c           0.814    0.027   29.652    0.000
##     SSkMa_d           0.931    0.028   32.753    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a           3.238    0.021  155.739    0.000
##    .SSkMa_b           3.013    0.025  122.777    0.000
##    .SSkMa_c           3.356    0.018  184.052    0.000
##    .SSkMa_d           3.377    0.018  189.921    0.000
##     SSmath            0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a           0.324    0.015   21.236    0.000
##    .SSkMa_b           0.766    0.027   27.915    0.000
##    .SSkMa_c           0.301    0.012   24.207    0.000
##    .SSkMa_d           0.171    0.011   15.854    0.000
##     SSmath            0.488    0.027   18.289    0.000
## 
## 
## Group 2 [Mig]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   SSmath =~                                           
##     SSkMa_a           1.000                           
##     SSkMa_b           0.758    0.066   11.542    0.000
##     SSkMa_c           0.699    0.050   14.064    0.000
##     SSkMa_d           0.916    0.053   17.127    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a           3.069    0.044   69.482    0.000
##    .SSkMa_b           2.798    0.051   55.224    0.000
##    .SSkMa_c           3.239    0.039   82.839    0.000
##    .SSkMa_d           3.241    0.040   80.897    0.000
##     SSmath            0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a           0.280    0.033    8.428    0.000
##    .SSkMa_b           0.833    0.059   14.087    0.000
##    .SSkMa_c           0.404    0.031   13.177    0.000
##    .SSkMa_d           0.219    0.027    8.049    0.000
##     SSmath            0.628    0.063    9.963    0.000
weak.invariance <- cfa(CFAmodel, data=datenLV, group = "Emigr", group.equal = "loadings")
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: group variable 'Emigr' contains missing values
summary(weak.invariance, fit.measures = TRUE)
## lavaan 0.6-8 ended normally after 27 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        24
##   Number of equality constraints                     3
##                                                       
##   Number of observations per group:               Used       Total
##     keinMig                                       1878        1966
##     Mig                                            465         493
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                               104.031
##   Degrees of freedom                                 7
##   P-value (Chi-square)                           0.000
##   Test statistic for each group:
##     keinMig                                     73.273
##     Mig                                         30.758
## 
## Model Test Baseline Model:
## 
##   Test statistic                              3389.720
##   Degrees of freedom                                12
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.971
##   Tucker-Lewis Index (TLI)                       0.951
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -10514.289
##   Loglikelihood unrestricted model (H1)     -10462.274
##                                                       
##   Akaike (AIC)                               21070.578
##   Bayesian (BIC)                             21191.521
##   Sample-size adjusted Bayesian (BIC)        21124.800
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.109
##   90 Percent confidence interval - lower         0.091
##   90 Percent confidence interval - upper         0.128
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.032
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## 
## Group 1 [keinMig]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   SSmath =~                                           
##     SSkMa_a           1.000                           
##     SSkMa_b (.p2.)    0.840    0.032   25.944    0.000
##     SSkMa_c (.p3.)    0.790    0.024   32.950    0.000
##     SSkMa_d (.p4.)    0.927    0.025   37.016    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a           3.238    0.021  154.981    0.000
##    .SSkMa_b           3.013    0.024  123.393    0.000
##    .SSkMa_c           3.356    0.018  185.466    0.000
##    .SSkMa_d           3.377    0.018  189.351    0.000
##     SSmath            0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a           0.322    0.015   21.310    0.000
##    .SSkMa_b           0.768    0.027   28.123    0.000
##    .SSkMa_c           0.304    0.012   24.751    0.000
##    .SSkMa_d           0.169    0.011   16.002    0.000
##     SSmath            0.498    0.026   19.451    0.000
## 
## 
## Group 2 [Mig]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   SSmath =~                                           
##     SSkMa_a           1.000                           
##     SSkMa_b (.p2.)    0.840    0.032   25.944    0.000
##     SSkMa_c (.p3.)    0.790    0.024   32.950    0.000
##     SSkMa_d (.p4.)    0.927    0.025   37.016    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a           3.069    0.043   70.625    0.000
##    .SSkMa_b           2.798    0.052   54.123    0.000
##    .SSkMa_c           3.239    0.040   80.088    0.000
##    .SSkMa_d           3.241    0.040   81.950    0.000
##     SSmath            0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a           0.292    0.029   10.086    0.000
##    .SSkMa_b           0.829    0.059   14.044    0.000
##    .SSkMa_c           0.395    0.030   12.983    0.000
##    .SSkMa_d           0.223    0.024    9.499    0.000
##     SSmath            0.586    0.049   11.977    0.000
anova(weak.invariance, configural) # LRT
## Chi-Squared Difference Test
## 
##                 Df   AIC   BIC   Chisq Chisq diff Df diff Pr(>Chisq)
## configural       4 21071 21209  98.189                              
## weak.invariance  7 21071 21192 104.031     5.8415       3     0.1196
fit.stats <- rbind(fitmeasures(configural, fit.measures = c("chisq", "df", "rmsea", "tli", "cfi", "aic")),
fitmeasures(weak.invariance, fit.measures = c("chisq", "df", "rmsea", "tli", "cfi", "aic")))
rownames(fit.stats) <- c("configural", "weak invariance")
fit.stats
##                     chisq df     rmsea       tli       cfi      aic
## configural       98.18934  4 0.1417750 0.9163436 0.9721145 21070.74
## weak invariance 104.03081  7 0.1087764 0.9507542 0.9712733 21070.58
strong.invariance <- cfa(CFAmodel, data=datenLV, group = "Emigr", group.equal = c( "loadings", "intercepts"))
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: group variable 'Emigr' contains missing values
summary(strong.invariance, fit.measures = TRUE)
## lavaan 0.6-8 ended normally after 39 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        25
##   Number of equality constraints                     7
##                                                       
##   Number of observations per group:               Used       Total
##     keinMig                                       1878        1966
##     Mig                                            465         493
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                               107.336
##   Degrees of freedom                                10
##   P-value (Chi-square)                           0.000
##   Test statistic for each group:
##     keinMig                                     73.780
##     Mig                                         33.556
## 
## Model Test Baseline Model:
## 
##   Test statistic                              3389.720
##   Degrees of freedom                                12
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.971
##   Tucker-Lewis Index (TLI)                       0.965
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -10515.941
##   Loglikelihood unrestricted model (H1)     -10462.274
##                                                       
##   Akaike (AIC)                               21067.883
##   Bayesian (BIC)                             21171.548
##   Sample-size adjusted Bayesian (BIC)        21114.359
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.091
##   90 Percent confidence interval - lower         0.076
##   90 Percent confidence interval - upper         0.107
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.032
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## 
## Group 1 [keinMig]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   SSmath =~                                           
##     SSkMa_a           1.000                           
##     SSkMa_b (.p2.)    0.844    0.032   26.189    0.000
##     SSkMa_c (.p3.)    0.788    0.024   33.102    0.000
##     SSkMa_d (.p4.)    0.925    0.025   37.264    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a (.10.)    3.237    0.020  159.328    0.000
##    .SSkMa_b (.11.)    2.999    0.023  130.339    0.000
##    .SSkMa_c (.12.)    3.358    0.017  192.037    0.000
##    .SSkMa_d (.13.)    3.380    0.018  192.150    0.000
##     SSmath            0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a           0.321    0.015   21.285    0.000
##    .SSkMa_b           0.767    0.027   28.090    0.000
##    .SSkMa_c           0.304    0.012   24.775    0.000
##    .SSkMa_d           0.170    0.011   16.109    0.000
##     SSmath            0.499    0.026   19.521    0.000
## 
## 
## Group 2 [Mig]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   SSmath =~                                           
##     SSkMa_a           1.000                           
##     SSkMa_b (.p2.)    0.844    0.032   26.189    0.000
##     SSkMa_c (.p3.)    0.788    0.024   33.102    0.000
##     SSkMa_d (.p4.)    0.925    0.025   37.264    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a (.10.)    3.237    0.020  159.328    0.000
##    .SSkMa_b (.11.)    2.999    0.023  130.339    0.000
##    .SSkMa_c (.12.)    3.358    0.017  192.037    0.000
##    .SSkMa_d (.13.)    3.380    0.018  192.150    0.000
##     SSmath           -0.164    0.042   -3.867    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SSkMa_a           0.291    0.029   10.062    0.000
##    .SSkMa_b           0.832    0.059   14.035    0.000
##    .SSkMa_c           0.395    0.030   12.989    0.000
##    .SSkMa_d           0.224    0.024    9.543    0.000
##     SSmath            0.587    0.049   11.990    0.000
anova(strong.invariance, weak.invariance, configural)
## Chi-Squared Difference Test
## 
##                   Df   AIC   BIC   Chisq Chisq diff Df diff Pr(>Chisq)
## configural         4 21071 21209  98.189                              
## weak.invariance    7 21071 21192 104.031     5.8415       3     0.1196
## strong.invariance 10 21068 21172 107.336     3.3049       3     0.3470
strict.invariance <- cfa(CFAmodel, data=datenLV, group = "Emigr", group.equal = c( "loadings", "intercepts", "residuals"))
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: group variable 'Emigr' contains missing values
anova(strict.invariance, strong.invariance, weak.invariance, configural)
## Chi-Squared Difference Test
## 
##                   Df   AIC   BIC   Chisq Chisq diff Df diff Pr(>Chisq)   
## configural         4 21071 21209  98.189                                 
## weak.invariance    7 21071 21192 104.031     5.8415       3   0.119583   
## strong.invariance 10 21068 21172 107.336     3.3049       3   0.346958   
## strict.invariance 14 21077 21158 124.388    17.0527       4   0.001888 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.3 confirmatory factor analysis

Verwendung des lavaan Paketes in R (siehe https://lavaan.ugent.be/), google group lavaan (https://groups.google.com/g/lavaan); es empfiehlt sich jedoch für komplexe Analysen Mplus zu verwenden (FIML, Bayesian SEM, …)

wichtigstes Grundlagenbuch zu SEM: Bollen (1989)

3.3.1 short theoretical: focus the structural modelling perspective

deduktive Methode

Context-Process-Input-Output model (bekannt in deutschsprachigen Raum durch Ditton (2000), entwickelt von Stufflebeam (1971); klare Ausführungen in Keller (2014)):

Dies lässt sich zusammenbauen zu einem nomologischen Netzwerk (= Testung Konstruktvalidität):

Welche möglicherweise Variablen interessant sind lässt sich aus einer graphischen theoretischen Ausarbeitung (Pfaddiagramm) schrittweise aufbauen (Kapitel 7 “causal models” in Jaccard and Jacoby (2020)):

3.3.2 CFA (measurement model)

Abschnitt CFA / SEM orientiert sich an Kapitel 9-12 in Hair et al. (2019):

an sich sollten die einzelnen Messmodelle (CFAs) getrennt berechnet werden, hier wird jedoch aus Zeitdrücken direkt eine CFA erster Ordnung (first order CFA) für alle Messmodelle, die im Strukturgleichungsmodell verwendet werden gerechnet:

firstorderCFA <- '
SSmath =~ SSkMa_a + SSkMa_b + SSkMa_c + SSkMa_d
SSgerman =~ SSkDe_a + SSkDe_b + SSkDe_c + SSkDe_d

SozInt =~ SBezMs_a + SBezMs_b + SBezMs_c + SBezMs_d

Abilities =~ wle_lesen + wle_hoeren + wle_mathe
'

# identification: Fixed factor
fit <-sem(firstorderCFA, data = datenLV, std.lv = TRUE)
summary(fit, standardized = TRUE, fit.measures=TRUE)
## lavaan 0.6-8 ended normally after 24 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        36
##                                                       
##                                                   Used       Total
##   Number of observations                          2566        3005
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                              1514.039
##   Degrees of freedom                                84
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                             11646.307
##   Degrees of freedom                               105
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.876
##   Tucker-Lewis Index (TLI)                       0.845
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -45769.003
##   Loglikelihood unrestricted model (H1)     -45011.984
##                                                       
##   Akaike (AIC)                               91610.007
##   Bayesian (BIC)                             91820.610
##   Sample-size adjusted Bayesian (BIC)        91706.228
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.081
##   90 Percent confidence interval - lower         0.078
##   90 Percent confidence interval - upper         0.085
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.054
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SSmath =~                                                             
##     SSkMa_a           0.733    0.017   44.281    0.000    0.733    0.789
##     SSkMa_b           0.618    0.021   29.186    0.000    0.618    0.569
##     SSkMa_c           0.584    0.015   39.403    0.000    0.584    0.722
##     SSkMa_d           0.661    0.014   47.284    0.000    0.661    0.828
##   SSgerman =~                                                           
##     SSkDe_a           0.568    0.018   31.610    0.000    0.568    0.643
##     SSkDe_b           0.553    0.022   24.924    0.000    0.553    0.523
##     SSkDe_c           0.494    0.016   31.469    0.000    0.494    0.640
##     SSkDe_d           0.553    0.015   38.039    0.000    0.553    0.758
##   SozInt =~                                                             
##     SBezMs_a          0.544    0.016   34.545    0.000    0.544    0.757
##     SBezMs_b          0.495    0.019   25.382    0.000    0.495    0.553
##     SBezMs_c          0.524    0.020   26.359    0.000    0.524    0.573
##     SBezMs_d          0.429    0.018   24.247    0.000    0.429    0.530
##   Abilities =~                                                          
##     wle_lesen         0.830    0.024   34.203    0.000    0.830    0.687
##     wle_hoeren        0.618    0.021   29.300    0.000    0.618    0.597
##     wle_mathe         0.909    0.022   40.680    0.000    0.909    0.809
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SSmath ~~                                                             
##     SSgerman          0.392    0.022   18.006    0.000    0.392    0.392
##     SozInt            0.212    0.024    8.711    0.000    0.212    0.212
##     Abilities         0.529    0.019   27.358    0.000    0.529    0.529
##   SSgerman ~~                                                           
##     SozInt            0.263    0.025   10.370    0.000    0.263    0.263
##     Abilities         0.427    0.023   18.929    0.000    0.427    0.427
##   SozInt ~~                                                             
##     Abilities         0.167    0.026    6.448    0.000    0.167    0.167
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .SSkMa_a           0.326    0.013   24.737    0.000    0.326    0.378
##    .SSkMa_b           0.798    0.024   32.781    0.000    0.798    0.676
##    .SSkMa_c           0.313    0.011   28.688    0.000    0.313    0.478
##    .SSkMa_d           0.200    0.009   21.359    0.000    0.200    0.314
##    .SSkDe_a           0.457    0.016   27.929    0.000    0.457    0.587
##    .SSkDe_b           0.815    0.026   31.736    0.000    0.815    0.727
##    .SSkDe_c           0.350    0.012   28.040    0.000    0.350    0.590
##    .SSkDe_d           0.227    0.011   20.899    0.000    0.227    0.426
##    .SBezMs_a          0.221    0.013   17.345    0.000    0.221    0.427
##    .SBezMs_b          0.555    0.019   29.542    0.000    0.555    0.694
##    .SBezMs_c          0.560    0.019   28.764    0.000    0.560    0.671
##    .SBezMs_d          0.472    0.016   30.322    0.000    0.472    0.719
##    .wle_lesen         0.770    0.030   25.848    0.000    0.770    0.528
##    .wle_hoeren        0.690    0.023   30.039    0.000    0.690    0.643
##    .wle_mathe         0.437    0.027   16.471    0.000    0.437    0.346
##     SSmath            1.000                               1.000    1.000
##     SSgerman          1.000                               1.000    1.000
##     SozInt            1.000                               1.000    1.000
##     Abilities         1.000                               1.000    1.000
semPlot::semPaths(object = fit, what = "est")

To account for the non-normal distribution of the questionnaire items and the small sample, the DWLS estimator was used and the \(X^2\) statistic was mean and variance adjusted (e.g. chapter 11 in Hancock and Mueller (2013)):

! limited information approach; FIML, Bayesian SEM is possible in Mplus

datenLV[,c("SSkMa_a",
        "SSkMa_b",
        "SSkMa_c",
        "SSkMa_d",
        "SSkDe_a",
        "SSkDe_b",
        "SSkDe_c",
        "SSkDe_d",
        "SBezMs_a",
        "SBezMs_b",
        "SBezMs_c",
        "SBezMs_d")] <-
  lapply(datenLV[,c("SSkMa_a",
        "SSkMa_b",
        "SSkMa_c",
        "SSkMa_d",
        "SSkDe_a",
        "SSkDe_b",
        "SSkDe_c",
        "SSkDe_d",
        "SBezMs_a",
        "SBezMs_b",
        "SBezMs_c",
        "SBezMs_d")], ordered)

head(datenLV$SSkMa_a)
## [1] 3 3 3 4 3 4
## Levels: 1 < 2 < 3 < 4
firstorderCFA <- '
SSmath =~ SSkMa_a + SSkMa_b + SSkMa_c + SSkMa_d
SSgerman =~ SSkDe_a + SSkDe_b + SSkDe_c + SSkDe_d

SozInt =~ SBezMs_a + SBezMs_b + SBezMs_c + SBezMs_d

Abilities =~ wle_lesen + wle_hoeren + wle_mathe
'

# identification: Marker variable method
fit <- sem(firstorderCFA, data = datenLV,
           ordered = c("SSkMa_a",
        "SSkMa_b",
        "SSkMa_c",
        "SSkMa_d",
        "SSkDe_a",
        "SSkDe_b",
        "SSkDe_c",
        "SSkDe_d",
        "SBezMs_a",
        "SBezMs_b",
        "SBezMs_c",
        "SBezMs_d"))

summary(fit, standardized = TRUE, fit.measures=TRUE)
## lavaan 0.6-8 ended normally after 37 iterations
## 
##   Estimator                                       DWLS
##   Optimization method                           NLMINB
##   Number of model parameters                        63
##                                                       
##                                                   Used       Total
##   Number of observations                          2566        3005
##                                                                   
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                              1091.263    1391.590
##   Degrees of freedom                                84          84
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  0.795
##   Shift parameter                                           19.352
##        simple second-order correction                             
## 
## Model Test Baseline Model:
## 
##   Test statistic                             39677.932   22840.150
##   Degrees of freedom                               105         105
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.741
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.975       0.942
##   Tucker-Lewis Index (TLI)                       0.968       0.928
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.068       0.078
##   90 Percent confidence interval - lower         0.065       0.074
##   90 Percent confidence interval - upper         0.072       0.082
##   P-value RMSEA <= 0.05                          0.000       0.000
##                                                                   
##   Robust RMSEA                                                  NA
##   90 Percent confidence interval - lower                        NA
##   90 Percent confidence interval - upper                        NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.057       0.057
## 
## Parameter Estimates:
## 
##   Standard errors                           Robust.sem
##   Information                                 Expected
##   Information saturated (h1) model        Unstructured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SSmath =~                                                             
##     SSkMa_a           1.000                               0.837    0.837
##     SSkMa_b           0.837    0.018   46.522    0.000    0.701    0.701
##     SSkMa_c           0.965    0.015   66.163    0.000    0.807    0.807
##     SSkMa_d           1.059    0.015   70.858    0.000    0.886    0.886
##   SSgerman =~                                                           
##     SSkDe_a           1.000                               0.682    0.682
##     SSkDe_b           0.975    0.031   31.326    0.000    0.665    0.665
##     SSkDe_c           1.084    0.029   37.812    0.000    0.740    0.740
##     SSkDe_d           1.184    0.029   40.916    0.000    0.808    0.808
##   SozInt =~                                                             
##     SBezMs_a          1.000                               0.821    0.821
##     SBezMs_b          0.745    0.030   25.241    0.000    0.611    0.611
##     SBezMs_c          0.836    0.032   26.338    0.000    0.686    0.686
##     SBezMs_d          0.781    0.031   24.827    0.000    0.641    0.641
##   Abilities =~                                                          
##     wle_lesen         1.000                               0.811    0.672
##     wle_hoeren        0.731    0.035   20.902    0.000    0.593    0.573
##     wle_mathe         1.196    0.048   24.758    0.000    0.970    0.863
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SSmath ~~                                                             
##     SSgerman          0.273    0.014   19.965    0.000    0.478    0.478
##     SozInt            0.166    0.017    9.494    0.000    0.241    0.241
##     Abilities         0.367    0.018   19.885    0.000    0.541    0.541
##   SSgerman ~~                                                           
##     SozInt            0.165    0.015   10.840    0.000    0.295    0.295
##     Abilities         0.253    0.016   16.022    0.000    0.457    0.457
##   SozInt ~~                                                             
##     Abilities         0.103    0.018    5.751    0.000    0.155    0.155
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .SSkMa_a           0.000                               0.000    0.000
##    .SSkMa_b           0.000                               0.000    0.000
##    .SSkMa_c           0.000                               0.000    0.000
##    .SSkMa_d           0.000                               0.000    0.000
##    .SSkDe_a           0.000                               0.000    0.000
##    .SSkDe_b           0.000                               0.000    0.000
##    .SSkDe_c           0.000                               0.000    0.000
##    .SSkDe_d           0.000                               0.000    0.000
##    .SBezMs_a          0.000                               0.000    0.000
##    .SBezMs_b          0.000                               0.000    0.000
##    .SBezMs_c          0.000                               0.000    0.000
##    .SBezMs_d          0.000                               0.000    0.000
##    .wle_lesen         0.144    0.024    6.032    0.000    0.144    0.119
##    .wle_hoeren        0.148    0.021    7.186    0.000    0.148    0.143
##    .wle_mathe         0.151    0.022    6.788    0.000    0.151    0.134
##     SSmath            0.000                               0.000    0.000
##     SSgerman          0.000                               0.000    0.000
##     SozInt            0.000                               0.000    0.000
##     Abilities         0.000                               0.000    0.000
## 
## Thresholds:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     SSkMa_a|t1       -1.438    0.037  -39.175    0.000   -1.438   -1.438
##     SSkMa_a|t2       -0.828    0.028  -29.478    0.000   -0.828   -0.828
##     SSkMa_a|t3        0.074    0.025    3.000    0.003    0.074    0.074
##     SSkMa_b|t1       -1.072    0.031  -34.946    0.000   -1.072   -1.072
##     SSkMa_b|t2       -0.442    0.026  -17.229    0.000   -0.442   -0.442
##     SSkMa_b|t3        0.185    0.025    7.417    0.000    0.185    0.185
##     SSkMa_c|t1       -1.758    0.045  -38.947    0.000   -1.758   -1.758
##     SSkMa_c|t2       -1.088    0.031  -35.229    0.000   -1.088   -1.088
##     SSkMa_c|t3       -0.014    0.025   -0.553    0.581   -0.014   -0.014
##     SSkMa_d|t1       -1.801    0.047  -38.664    0.000   -1.801   -1.801
##     SSkMa_d|t2       -1.109    0.031  -35.599    0.000   -1.109   -1.109
##     SSkMa_d|t3       -0.044    0.025   -1.776    0.076   -0.044   -0.044
##     SSkDe_a|t1       -1.575    0.040  -39.504    0.000   -1.575   -1.575
##     SSkDe_a|t2       -0.787    0.028  -28.379    0.000   -0.787   -0.787
##     SSkDe_a|t3        0.228    0.025    9.111    0.000    0.228    0.228
##     SSkDe_b|t1       -1.090    0.031  -35.260    0.000   -1.090   -1.090
##     SSkDe_b|t2       -0.420    0.026  -16.448    0.000   -0.420   -0.420
##     SSkDe_b|t3        0.308    0.025   12.218    0.000    0.308    0.308
##     SSkDe_c|t1       -1.816    0.047  -38.552    0.000   -1.816   -1.816
##     SSkDe_c|t2       -1.175    0.032  -36.630    0.000   -1.175   -1.175
##     SSkDe_c|t3        0.081    0.025    3.276    0.001    0.081    0.081
##     SSkDe_d|t1       -2.003    0.055  -36.643    0.000   -2.003   -2.003
##     SSkDe_d|t2       -1.244    0.033  -37.544    0.000   -1.244   -1.244
##     SSkDe_d|t3        0.043    0.025    1.737    0.082    0.043    0.043
##     SBezMs_a|t1      -2.033    0.056  -36.257    0.000   -2.033   -2.033
##     SBezMs_a|t2      -1.294    0.034  -38.099    0.000   -1.294   -1.294
##     SBezMs_a|t3      -0.012    0.025   -0.474    0.636   -0.012   -0.012
##     SBezMs_b|t1      -1.495    0.038  -39.394    0.000   -1.495   -1.495
##     SBezMs_b|t2      -0.808    0.028  -28.930    0.000   -0.808   -0.808
##     SBezMs_b|t3       0.243    0.025    9.702    0.000    0.243    0.243
##     SBezMs_c|t1      -1.475    0.037  -39.329    0.000   -1.475   -1.475
##     SBezMs_c|t2      -0.999    0.030  -33.512    0.000   -0.999   -0.999
##     SBezMs_c|t3      -0.278    0.025  -11.078    0.000   -0.278   -0.278
##     SBezMs_d|t1      -1.723    0.044  -39.140    0.000   -1.723   -1.723
##     SBezMs_d|t2      -1.183    0.032  -36.742    0.000   -1.183   -1.183
##     SBezMs_d|t3      -0.456    0.026  -17.735    0.000   -0.456   -0.456
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .SSkMa_a           0.300                               0.300    0.300
##    .SSkMa_b           0.509                               0.509    0.509
##    .SSkMa_c           0.348                               0.348    0.348
##    .SSkMa_d           0.214                               0.214    0.214
##    .SSkDe_a           0.535                               0.535    0.535
##    .SSkDe_b           0.558                               0.558    0.558
##    .SSkDe_c           0.453                               0.453    0.453
##    .SSkDe_d           0.348                               0.348    0.348
##    .SBezMs_a          0.326                               0.326    0.326
##    .SBezMs_b          0.626                               0.626    0.626
##    .SBezMs_c          0.530                               0.530    0.530
##    .SBezMs_d          0.588                               0.588    0.588
##    .wle_lesen         0.801    0.032   25.318    0.000    0.801    0.549
##    .wle_hoeren        0.720    0.025   28.933    0.000    0.720    0.672
##    .wle_mathe         0.323    0.033    9.688    0.000    0.323    0.255
##     SSmath            0.700    0.015   47.431    0.000    1.000    1.000
##     SSgerman          0.465    0.019   24.768    0.000    1.000    1.000
##     SozInt            0.674    0.028   24.229    0.000    1.000    1.000
##     Abilities         0.658    0.039   16.728    0.000    1.000    1.000
## 
## Scales y*:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     SSkMa_a           1.000                               1.000    1.000
##     SSkMa_b           1.000                               1.000    1.000
##     SSkMa_c           1.000                               1.000    1.000
##     SSkMa_d           1.000                               1.000    1.000
##     SSkDe_a           1.000                               1.000    1.000
##     SSkDe_b           1.000                               1.000    1.000
##     SSkDe_c           1.000                               1.000    1.000
##     SSkDe_d           1.000                               1.000    1.000
##     SBezMs_a          1.000                               1.000    1.000
##     SBezMs_b          1.000                               1.000    1.000
##     SBezMs_c          1.000                               1.000    1.000
##     SBezMs_d          1.000                               1.000    1.000
semPlot::semPaths(object = fit, what = "est")

3.3.3 SEM (measurement model + structural model)

SEMmodel <- '
  # measurement models
SSmath =~ SSkMa_a + SSkMa_b + SSkMa_c + SSkMa_d
SSgerman =~ SSkDe_a + SSkDe_b + SSkDe_c + SSkDe_d

SozInt =~ SBezMs_a + SBezMs_b + SBezMs_c + SBezMs_d

Abilities =~ wle_lesen + wle_hoeren + wle_mathe

  # regressions (+2 dummies)
  Abilities ~ SSmath + SSgerman + SozInt + Emigr + tr_sex + EHisei
'

# identification: Marker variable method
fit <- sem(SEMmodel, data = datenLV,
           ordered = c("SSkMa_a",
        "SSkMa_b",
        "SSkMa_c",
        "SSkMa_d",
        "SSkDe_a",
        "SSkDe_b",
        "SSkDe_c",
        "SSkDe_d",
        "SBezMs_a",
        "SBezMs_b",
        "SBezMs_c",
        "SBezMs_d"))

summary(fit, standardized = TRUE, fit.measures=TRUE)
## lavaan 0.6-8 ended normally after 39 iterations
## 
##   Estimator                                       DWLS
##   Optimization method                           NLMINB
##   Number of model parameters                        66
##                                                       
##                                                   Used       Total
##   Number of observations                          1973        3005
##                                                                   
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                              1505.862    1548.544
##   Degrees of freedom                               126         126
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  0.994
##   Shift parameter                                           33.596
##        simple second-order correction                             
## 
## Model Test Baseline Model:
## 
##   Test statistic                             27860.196   16284.077
##   Degrees of freedom                               105         105
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.715
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.950       0.912
##   Tucker-Lewis Index (TLI)                       0.959       0.927
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.075       0.076
##   90 Percent confidence interval - lower         0.071       0.072
##   90 Percent confidence interval - upper         0.078       0.079
##   P-value RMSEA <= 0.05                          0.000       0.000
##                                                                   
##   Robust RMSEA                                                  NA
##   90 Percent confidence interval - lower                        NA
##   90 Percent confidence interval - upper                        NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.055       0.055
## 
## Parameter Estimates:
## 
##   Standard errors                           Robust.sem
##   Information                                 Expected
##   Information saturated (h1) model        Unstructured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SSmath =~                                                             
##     SSkMa_a           1.000                               0.828    0.828
##     SSkMa_b           0.830    0.021   38.856    0.000    0.688    0.688
##     SSkMa_c           0.958    0.018   54.715    0.000    0.794    0.794
##     SSkMa_d           1.072    0.018   59.008    0.000    0.887    0.887
##   SSgerman =~                                                           
##     SSkDe_a           1.000                               0.681    0.681
##     SSkDe_b           0.983    0.035   27.891    0.000    0.669    0.669
##     SSkDe_c           1.063    0.033   32.359    0.000    0.724    0.724
##     SSkDe_d           1.181    0.034   35.089    0.000    0.804    0.804
##   SozInt =~                                                             
##     SBezMs_a          1.000                               0.828    0.828
##     SBezMs_b          0.768    0.034   22.361    0.000    0.636    0.636
##     SBezMs_c          0.808    0.036   22.713    0.000    0.669    0.669
##     SBezMs_d          0.766    0.035   21.863    0.000    0.634    0.634
##   Abilities =~                                                          
##     wle_lesen         1.000                               0.814    0.693
##     wle_hoeren        0.706    0.039   17.878    0.000    0.574    0.572
##     wle_mathe         1.130    0.051   21.943    0.000    0.919    0.841
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Abilities ~                                                           
##     SSmath            0.383    0.031   12.450    0.000    0.389    0.389
##     SSgerman          0.248    0.038    6.571    0.000    0.207    0.207
##     SozInt           -0.039    0.027   -1.449    0.147   -0.040   -0.040
##     Emigr             0.369    0.051    7.206    0.000    0.454    0.178
##     tr_sex           -0.076    0.039   -1.937    0.053   -0.093   -0.046
##     EHisei            0.017    0.001   12.552    0.000    0.021    0.327
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SSmath ~~                                                             
##     SSgerman          0.279    0.016   17.807    0.000    0.495    0.495
##     SozInt            0.168    0.020    8.468    0.000    0.245    0.245
##   SSgerman ~~                                                           
##     SozInt            0.164    0.018    9.265    0.000    0.290    0.290
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .SSkMa_a           0.000                               0.000    0.000
##    .SSkMa_b           0.000                               0.000    0.000
##    .SSkMa_c           0.000                               0.000    0.000
##    .SSkMa_d           0.000                               0.000    0.000
##    .SSkDe_a           0.000                               0.000    0.000
##    .SSkDe_b           0.000                               0.000    0.000
##    .SSkDe_c           0.000                               0.000    0.000
##    .SSkDe_d           0.000                               0.000    0.000
##    .SBezMs_a          0.000                               0.000    0.000
##    .SBezMs_b          0.000                               0.000    0.000
##    .SBezMs_c          0.000                               0.000    0.000
##    .SBezMs_d          0.000                               0.000    0.000
##    .wle_lesen        -1.602    0.152  -10.511    0.000   -1.602   -1.364
##    .wle_hoeren       -0.980    0.123   -7.956    0.000   -0.980   -0.976
##    .wle_mathe        -0.892    0.132   -6.777    0.000   -0.892   -0.816
##     SSmath            0.000                               0.000    0.000
##     SSgerman          0.000                               0.000    0.000
##     SozInt            0.000                               0.000    0.000
##    .Abilities         0.000                               0.000    0.000
## 
## Thresholds:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     SSkMa_a|t1       -1.467    0.152   -9.631    0.000   -1.467   -1.467
##     SSkMa_a|t2       -0.859    0.151   -5.700    0.000   -0.859   -0.859
##     SSkMa_a|t3        0.121    0.150    0.805    0.421    0.121    0.121
##     SSkMa_b|t1       -0.723    0.148   -4.881    0.000   -0.723   -0.723
##     SSkMa_b|t2       -0.084    0.147   -0.572    0.567   -0.084   -0.084
##     SSkMa_b|t3        0.586    0.148    3.961    0.000    0.586    0.586
##     SSkMa_c|t1       -1.553    0.154  -10.081    0.000   -1.553   -1.553
##     SSkMa_c|t2       -0.882    0.151   -5.852    0.000   -0.882   -0.882
##     SSkMa_c|t3        0.225    0.151    1.492    0.136    0.225    0.225
##     SSkMa_d|t1       -1.786    0.157  -11.392    0.000   -1.786   -1.786
##     SSkMa_d|t2       -1.070    0.153   -7.005    0.000   -1.070   -1.070
##     SSkMa_d|t3        0.041    0.152    0.267    0.789    0.041    0.041
##     SSkDe_a|t1       -0.456    0.150   -3.050    0.002   -0.456   -0.456
##     SSkDe_a|t2        0.292    0.148    1.975    0.048    0.292    0.292
##     SSkDe_a|t3        1.367    0.150    9.100    0.000    1.367    1.367
##     SSkDe_b|t1        0.211    0.144    1.461    0.144    0.211    0.211
##     SSkDe_b|t2        0.899    0.144    6.249    0.000    0.899    0.899
##     SSkDe_b|t3        1.672    0.146   11.481    0.000    1.672    1.672
##     SSkDe_c|t1       -1.090    0.158   -6.886    0.000   -1.090   -1.090
##     SSkDe_c|t2       -0.456    0.152   -3.008    0.003   -0.456   -0.456
##     SSkDe_c|t3        0.857    0.153    5.611    0.000    0.857    0.857
##     SSkDe_d|t1       -1.218    0.161   -7.542    0.000   -1.218   -1.218
##     SSkDe_d|t2       -0.515    0.152   -3.382    0.001   -0.515   -0.515
##     SSkDe_d|t3        0.831    0.153    5.418    0.000    0.831    0.831
##     SBezMs_a|t1      -1.361    0.160   -8.500    0.000   -1.361   -1.361
##     SBezMs_a|t2      -0.596    0.152   -3.917    0.000   -0.596   -0.596
##     SBezMs_a|t3       0.722    0.154    4.697    0.000    0.722    0.722
##     SBezMs_b|t1      -0.772    0.144   -5.354    0.000   -0.772   -0.772
##     SBezMs_b|t2      -0.076    0.143   -0.529    0.597   -0.076   -0.076
##     SBezMs_b|t3       1.019    0.146    6.996    0.000    1.019    1.019
##     SBezMs_c|t1      -1.196    0.154   -7.759    0.000   -1.196   -1.196
##     SBezMs_c|t2      -0.728    0.153   -4.752    0.000   -0.728   -0.728
##     SBezMs_c|t3       0.034    0.154    0.223    0.824    0.034    0.034
##     SBezMs_d|t1      -1.203    0.164   -7.316    0.000   -1.203   -1.203
##     SBezMs_d|t2      -0.650    0.160   -4.056    0.000   -0.650   -0.650
##     SBezMs_d|t3       0.091    0.161    0.568    0.570    0.091    0.091
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .SSkMa_a           0.314                               0.314    0.314
##    .SSkMa_b           0.527                               0.527    0.527
##    .SSkMa_c           0.370                               0.370    0.370
##    .SSkMa_d           0.213                               0.213    0.213
##    .SSkDe_a           0.536                               0.536    0.536
##    .SSkDe_b           0.552                               0.552    0.552
##    .SSkDe_c           0.476                               0.476    0.476
##    .SSkDe_d           0.354                               0.354    0.354
##    .SBezMs_a          0.315                               0.315    0.315
##    .SBezMs_b          0.596                               0.596    0.596
##    .SBezMs_c          0.553                               0.553    0.553
##    .SBezMs_d          0.598                               0.598    0.598
##    .wle_lesen         0.718    0.031   22.816    0.000    0.718    0.520
##    .wle_hoeren        0.679    0.026   25.904    0.000    0.679    0.673
##    .wle_mathe         0.350    0.031   11.178    0.000    0.350    0.293
##     SSmath            0.686    0.017   39.302    0.000    1.000    1.000
##     SSgerman          0.464    0.022   21.511    0.000    1.000    1.000
##     SozInt            0.685    0.032   21.401    0.000    1.000    1.000
##    .Abilities         0.378    0.028   13.341    0.000    0.571    0.571
## 
## Scales y*:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     SSkMa_a           1.000                               1.000    1.000
##     SSkMa_b           1.000                               1.000    1.000
##     SSkMa_c           1.000                               1.000    1.000
##     SSkMa_d           1.000                               1.000    1.000
##     SSkDe_a           1.000                               1.000    1.000
##     SSkDe_b           1.000                               1.000    1.000
##     SSkDe_c           1.000                               1.000    1.000
##     SSkDe_d           1.000                               1.000    1.000
##     SBezMs_a          1.000                               1.000    1.000
##     SBezMs_b          1.000                               1.000    1.000
##     SBezMs_c          1.000                               1.000    1.000
##     SBezMs_d          1.000                               1.000    1.000
semPlot::semPaths(object = fit, what = "est")

3.4 short: item response theory

wörtliche Anmerkungen, wenn Zeit übrig

3.5 short: longitudinal data / multi-group analysis

wörtliche Anmerkungen, wenn Zeit übrig

References

Bollen, Kenneth. 1989. Structural Equations with Latent Variables. John Wiley.

Costello, Anna, and Jason Osborne. 2005. “Best Practices in Exploratory Factor Analysis: Four Recommendations for Getting the Most from Your Analysis.” Practical Assessment, Research & Evaluation 10 (7): 1–9.

Ditton, Hartmut. 2000. “Qualitätskontrolle Und Qualitätssicherung in Schule Und Unterricht. Ein Überblick Zum Stand Der Empirischen Forschung.” Zeitschrift Für Pädagogik, no. 41.

Grund, Simon, Oliver Lüdtke, and Alexander Robitzsch. 2018. “Multiple Imputation of Missing Data for Multilevel Models: Simulations and Recommendations.” Organizational Research Methods 21 (1): 111–49.

Hair, Joseph F, William C Black, Barry J Babin, and Rolph E Anderson. 2019. Multivariate Data Analysis. Annabel Ainscow.

Hancock, Gregory R, and Ralph Mueller. 2013. Structural Equation Modeling: A Second Course. Iap.

Jaccard, James, and Jacob Jacoby. 2020. Theory Construction and Model-Building Skills: A Practical Guide for Social Scientists. Guilford Publications.

Keller, Florian. 2014. Strukturelle Faktoren Des Bildungserfolgs: Wie Das Bildungssystem Den Übertritt Ins Berufsleben Bestimmt. Springer-Verlag.

Marsh, Herbert W, Alexandre JS Morin, Philip D Parker, and Gurvinder Kaur. 2014. “Exploratory Structural Equation Modeling: An Integration of the Best Features of Exploratory and Confirmatory Factor Analysis.” Annual Review of Clinical Psychology 10: 85–110.

Moosbrugger, Helfried, and Augustin Kelava. 2020. Testtheorie Und Fragebogenkonstruktion. Springer.

Mvududu, Nyaradzo, and Christopher Sink. 2013. “Factor Analysis in Counseling Research and Practice.” Counseling Outcome Research and Evaluation 4 (2): 75–98.

Sijtsma, Klaas. 2009. “On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha.” Psychometrika 74 (1): 107–20.

Stufflebeam, Daniel. 1971. “The Relevance of the Cipp Evaluation Model for Educational Accountability.”